Atari 2600 programming techniques are fascinating. The hardware is so very limited, and programmers must use every possible trick to scrimp and save the last CPU cycle and byte of RAM. For a refresher on the 2600 hardware, check out my previous Atari overview. The console’s longevity is remarkable, with new games and demos still being produced today – over 100 new titles last year alone. The quest for the ultimate Atari 2600 programming techniques has continued all this time, in order to wring out maximum performance.
Atari programming requires racing the beam, updating graphics registers just before the instant they’ll be used to paint the next pixels on the current scan line. With only 76 CPU cycles per scan line, there just isn’t enough time for the poor 6502 to do very much. Want to update the foreground color multiple times at different horizontal positions? OK, but there might not be enough time remaining to also update the pixel bitmap during the scan line, or set the sprite positions. It’s a series of difficult tradeoffs and code optimization puzzles.
More Hardware Makes Things Better?
To create better-performing Atari games, programmers may need to think outside the box – literally. The 2600 console consists of a 6502 CPU (actually a 6507), a 6532 RIOT, and an Atari custom graphics chip called TIA. But what about the game cartridge? What’s inside that? The Atari’s designers envisioned cartridges as simple 4KB ROMs in a protective plastic shell, but there’s no reason aside from cost why there couldn’t be other hardware inside a cartridge too.
Within a few years after the Atari’s 1977 release, game publishers began including a small amount of 7400-type glue logic or a simple PLD inside game cartridges, in order to assist with bank switching. The Atari only provides 4KB of address space for game cartridges, but with this extra hardware inside, publishers were able to make 8KB and 16KB games using custom bank-switching schemes. Some later cartridges also included additional RAM to augment the paltry 128 bytes available in the Atari. This provided more storage space to create larger games, but didn’t help improve the Atari’s CPU or graphics performance.
Pitfall II and the DPC Coprocessor
Activision’s Pitfall was one of the most popular games for the Atari 2600. For the sequel Pitfall II, expectations were high, and the game’s designer David Crane did something that had never been done before: put a full-blown coprocessor inside the game cartridge. Crane’s DPC (Display Processor Chip) added two additional hardware sound channels, a music sequencing capability, a hardware random number generator, and a graphics streaming capability with built-in options for masking, reversing, or swizzling the bits.
That’s an appealing collection of features, but the graphics streaming capability is probably the most interesting. To understand how it can help improve graphics performance, let’s first look at some hypothetical code without DPC. Imagine that the game code wants to read bytes from a graphics table in ROM and stuff them into the TIA’s GRP0 register (the pixel bitmap for sprite 0) as quickly as possible, in order to create different patterns at points on the scan line. The code might look like this:
; setup
lda #>DrawData ; an address in ROM
sta GfxPtr ; a pointer in RAM page 0
lda #<DrawData
sta GfxPtr+1
; kernel
lda (GfxPtr),X ; 5
sta GRP0 ; 3
inx ; 2
lda (GfxPtr),X ; 5
sta GRP0 ; 3
inx ; 2
lda (GfxPtr),X ; 5
sta GRP0 ; 3
This requires a total of 10 CPU cycles for every repetition of read, write, and increment of the index register. During those 10 CPU cycles, the electron beam will move 30 pixels horizontally, so in this example it would only be possible to update the pixel bitmap every 30 pixels. This limits the number of objects or patterns that can be displayed in the same horizontal row of the screen.
The DPC provides some extensions to speed up this process. From my study of the chip, it works like this:
- Writing to a special pair of addresses in the cartridge’s address space will set up a graphics stream pointer
- Reading from a special address will return the next byte from the stream pointer, and increment the pointer
Using the DPC, the previous example could be rewritten something like this:
; setup
lda #>DrawData ; an address in ROM
sta BytePtr ; special addr in cartridge address space
lda #<DrawData
sta BytePtr+1
; kernel
lda NextByte ; 4 - special addr in cartridge address space
sta GRP0 ; 3
lda NextByte ; 4
sta GRP0 ; 3
lda NextByte ; 4
sta GRP0 ; 3
This reduces the total number of CPU cycles for every repetition from 10 down to 7, enabling the pixel bitmap to be updated every 21 pixels instead of only every 30 pixels – quite a nice improvement.
Pitfall II was released in 1984. Despite the major effort that doubtless went into designing the DPC chip, Pitfall II was the only game to ever use it. No other Atari 2600 games from the original Atari era 1977-1992 ever included a DPC or other coprocessor. The DPC was an impressive one-off, and remained the pinnacle of Atari 2600 hardware acceleration tech for 25 years, until rising interest in retrogaming eventually rekindled interest in the Atari console.
21st-Century Hardware
The Harmony Cartridge for Atari 2600 was first released in 2009. Harmony is a bit like my BMOW Floppy Emu Disk Emulator, but for Atari game cartridges instead of Apple floppy disks. Inside the cartridge is an ARM microcontroller and an SD card reader, and the hardware loads Atari game ROMs from the SD card and then emulates a standard ROM chip, including any necessary bank switching logic or extra RAM that may have existed in the original game cartridge.
The original function of Harmony was simply to be a multi-ROM cartridge, but for Pitfall II, Harmony’s designer was also able to use the ARM microcontroller to emulate David Crane’s DPC chip. Before long, he and a few collaborators began to share ideas for a new coprocessor design, like DPC but even more powerful, and emulated by Harmony’s ARM chip. While this wouldn’t do anything to benefit the library of original Atari games, it would open up new possibilities for the active community of Atari 2600 homebrew game developers. The coprocessor design they eventually created is called DPC+.
I haven’t looked at DPC+ in detail, so I may be misstating the specifics, but I’ve seen three features that look particularly interesting.
FastFetch – Atari programs often include time-critical sequences that load a byte from zero page RAM (3 CPU cycles), write the byte to a TIA register (3 CPU cycles), and repeat. Six total CPU cycles per iteration. For drawing fixed graphics patterns, the load from zero page RAM can be replaced with a load of an immediate value into a register, which reduces the total to five clock cycles but means the graphics data must a constant. That constant value is part of the program, which is stored in ROM, which is emulated by the Harmony cartridge. Harmony is able to perform a bit of trickery here, so when the program performs an immediate load of the constant value, Harmony actually supplies data from the graphics stream instead. This makes it possible to draw dynamic graphics data while still enjoying the five-cycle performance of drawing constant graphics data.
Bus Stuffing – The data bus output drivers on the NMOS 6502 (and 6507) chip are non-symmetric. They drive logical 0 bits strongly to near 0 volts, but logical 1 bits use a weak pull-up to 5 volts. If the CPU is outputting a 1 on the data bus, external hardware can safely overdrive the bus with a 0 without damaging the CPU – something that would be dangerous with a symmetric push-pull output driver. For time-critical code loops, Harmony uses this technique to eliminate CPU graphics data reads entirely. The CPU program just performs a long series of write instructions, repeatedly writing the value $FF (binary 11111111) to TIA registers, while Harmony pulls down the appropriate data bus lines to create the next byte of the graphics stream. This reduces the number of CPU cycles needed per iteration to a mere three cycles.
ARM Code Execution (ACE) – Still not fast enough? Harmony also enables the Atari’s 6502 program to upload and execute arbitrary code on Harmony’s 32-bit 70 MHz ARM microcontroller. Compared to the 8-bit 6502 running at 1 MHz, that’s a dramatic performance improvement for compute-heavy code.
Aside from DPC+, Harmony also supports other coprocessor models called CDF and CDFJ, which are further extensions of DPC+. I haven’t looked at these.
Harmony is no longer the only player in this space, and since 2018 some interesting alternatives have appeared. UnoCart 2600 is conceptually very similar to Harmony, but is open source and uses a more capable microcontroller. To my understanding, UnoCart 2600 supports bus stuffing and ACE, although ACE code for Harmony isn’t directly compatible with the UnoCart 2600 due to the differing memory maps of the microcontrollers on the two cartridges. UnoCart 2600 does not support other DPC+ extensions, which I think is because Harmony is a closed source design and its maintainers have chosen not to share the tech details needed for DPC+ emulation. Most recently and intriguingly, PlusCart has appeared as an UnoCart 2600 spin-off that loads game ROMs over WiFi instead of from an SD card. It’s the golden age of retro-Atari products.
When is an Atari no longer an Atari?
Thanks to Harmony and similar devices, within roughly the past 10 years, there has emerged a large and growing library of “hardware accelerated” homebrew Atari games. With the extra support of the coprocessor in the Harmony or UnoCart cartridge, these games are able to create graphics and sound that are much more impressive than anything that was possible back in the 1980s.
This all leads to the question I’ve been grappling with. When an Atari 2600 game console from 1977 is hot-rodded with a 32-bit coprocessor running at 100+ MHz, is it still an Atari 2600? Should games that target this augmented hardware still be considered Atari 2600 games?
Here’s my small editorial. There are no right or wrong answers about this, and everyone should be encouraged to do whatever they find to be most fun. From a technical standpoint I find Harmony and its siblings to be impressively clever, fascinating to study, and I wish I’d thought of these ideas myself. And the multi-ROM cartridge capability is extremely convenient for any modern game collector who doesn’t want to store hundreds of physical game cartridges.
As for the DPC+ coprocessor and similar hardware acceleration extensions, they make me feel… conflicted? Uneasy? When it comes to retrocomputing, I’m something of a hardware purist. For me, the appeal of the Atari 2600 comes from trying to make the most out of its limited hardware capabilities. By adding new 21st-century hardware with new performance-enhancing capabilities, it’s effectively moving the goalposts. I can compare two Atari games that target the original hardware, and if the first game looks visually more impressive than the second, I’ll know the first programmer did a better job. But if one of the games uses a coprocessor with extra hardware capabilities, then it’s no longer a fair comparison. Eventually I hope to try writing my own Atari game, and I’ll probably target the original hardware without any coprocessor extensions, so that I can fairly compare my efforts against the classic Atari games of the 1980s.