Recently I have been working on getting the 6502 talk with the coprocessor, so that a region in 6502's address space is mapped to characters on the screen. In this blog post I'll review some alternative ideas that I considered before settling on the final design.
Time is of the essence
The ATmega has 16 kilobytes of internal SRAM memory which is really fast (it can read or write a byte in 2 cycles), but there is no way to read data fast enough from an external memory chip during PAL video generation. Therefore, the only possibility is transfer the needed bytes from the external SRAM chip to the internal SRAM during the scanlines when the ATmega is not outputting pixels. The screen mode I'm using has 256 lines of vertical resolution and a progressive PAL frame has a total of 312 lines, so luckily I have plenty of free scanlines to do the memory transfer. I figured the best time to do it is during the top border area, that is made of 32 blank scanlines before the visible image starts.
But it's not that simple. The ATmega can't just go and peek and poke to the memory anytime because the 6502 is executing and accessing the memory all the time. There are at least three ways to solve this problem. Firstly there are dual port memory chips that can deal with memory accesses from two sources at the same time. This kind of memory is more expensive and was not used in the 80s microcomputers. I felt that this design would not be in the spirit of the 80s micros and besides I already have my regular 628128 128K x 8 SRAM chip plugged in, so I dropped this idea.
The VIC-20 and C64 solved this problem cleverly by timesharing. The internal architecture of the 6502 is not pipelined and it can only access memory when its clock signal is high. The VIC-I and VIC-II graphics chips in VIC-20 and C64 take advantage of this and access the memory when the clock signal is low. This is a really neat because both chips can think that they own the memory all the time. The disadvantage of this approach is that the video chip and the 6502 are executing in lockstep. Basically the clock frequency of the 6502 of these systems is fixed to about 1 MHz and trying to change this will mess up the video chip timing badly. I considered implementing this idea and I think it could very well work. Since the ATmega is generating the clock signal, it knows in which state the 6502 clock is. Clocking the ATmega at 16 Mhz and 6502 at 1 MHz, the ATmega would have 16 cycles for every 6502 clock cycle. This could be just enough time to access the memory. Maybe something along the lines of this pseudo assembly routine could do the trick:
I haven't tried this approach yet, because it would be pretty much impossible to verify the timing without a logic analyzer or oscilloscope. Without exactly correct timing bad things will certainly happen.
But there is a third, much simpler way and this is what I ended up doing. The ATmega is generating the clock signal and it can halt the 6502 whenever it needs to access the memory. Now that I have upgraded the CPU, halting it is ridicuously simple. I just have to set the ATmega timer frequency to zero and the clock will stop in whatever state it was. Resuming the clock is as simple, I can just reset the timer frequency to whatever value it was. This solution has the nice property that the 6502 can be clocked independently from the video chip so 4 MHz system clock or even higher is no problem at all.
Here is the piece of code I'm using to halt and restart the CPU:
The memory transfers are implemented in the firmware in copymem128 routine. The routine halts the CPU, copies a 128 byte block from external SRAM to internal SRAM of the ATmega and resumes the CPU. The routine is called by the first 13 scanlines just after the vertical sync. In total 13*128 = 1664 bytes are copied, which is a few bytes larger than the screen ram. The screen ram used to contain pointers to character data, but I have changed the screen ram to contain character indices instead. This cuts the number of bytes to be copied over to half.
All the bytes copied are always on the same 256-byte page of RAM, so only the low byte of the address needs to be updated during the memory copy.
Here is the piece of code that copies the 128 bytes. I had to insert an extra nop in the loop, otherwise the data would not be copied correctly. Even without the nop, the latency should be within the specs of the 70ns SRAM, so I suspect that the breadboard must be caused problems here. I will try to optimize the nop away when I will eventually build this on the PCB.
Why 128 bytes? I would have hoped to copy an entire 256 byte page per scanline but unfortunately there is not enough time per scanline to copy an entire page. The ATmega has only 1024 cycles per scanline.
Finally the project is in a state where the 6502 can do something visible. To test the video memory interface I assembled a small 6502 program to update bytes in the screen memory area. It first clears the screen and then prints some text on the screen in a loop. Printing has been artificially slowed down by adding a delay loop because a 6502 running at 1 MHz is such a beast ;-).
Below is a video showing the output of the test program and the 6502 source code. Writing larger programs is going to be really tedious by manually typing in opcodes, I need to get a real assembler soon!
Thanks for reading! As always you can find the latest version of the source code at GitHub.