Yes, the thought of two vintage computing devices battling as digital gladiators in a cage match to the death under the watchful eye of our supercomputer overlord, the HAL 9000, is amusing at best. But this is Pi Day, and what better way to celebrate than with silicon blood. Who knows, perhaps this did happen a long time ago in The Grid.
Let the games begin!
In a battle before time itself (I’m referring to the lack of a clock in the Apple II and 41C), this David and Goliath-esque story started to unfold as I was cleaning my office not too long ago. (Cleaning my office is really an excuse to get my retro on.)
I’ve always been a fan (and user) of the Apple II, RPN calculators, and of course, Pi. It’s no wonder that I have stacks of machines, journals, books, magazines, printouts, etc… related to all three topics.
As I was filtering through stacks of papers to determine the value of each scrap of paper, I stumbled across a 1980 article by Ron Knapp, PI TO 1,110 PLACES. I had forgotten about this article. I remember printing it out along with a 1981 followup article by the same author, FASTEST PI TO 1,000 PLACES. At the time I was interested in the relative 1000-digit Pi performance of all ’80s calculators, and gravitated towards the newer, faster code, and had forgotten to read the first article.
So, I read it. Hmmm…
The run-time for this program is relatively short: 30 places in 2 min., 90 places in 9 min., 200 places in 34 min., 1000 places in 11 ½ hr., and 1160 places in 15 ¼ hr. When you consider that an article in “The Best of Micro” some time ago stated that an Apple II had been programmed to calculate “pi” to 1000 places in 40 hours–the run-time for the 41-C, a shirt-pocked calculator, seems very fast indeed.
Wow! Really? Can this be true? I was determined to get to the bottom of this. I’ll be “cleaning” my office for a bit longer.
Confirming the Results
It didn’t take too long to track down the source of Ron Knapp’s statement; Robert J. Bishop’s article, APPLE PI:
Since the program is written entirely in BASIC it is understandably slow.
Ah ha! BASIC! That explains everything. Right?
I knew from my Apple II days in high school that machine language was faster than BASIC, as was Pascal. But is a circa 1977 computer running BASIC really 4-5 times slower than a circa 1979 pocket calculator? I wanted to see the results myself, and find out why?.
I do not have a clock card in my Apple //c, nor did I have the time to fashion one out of a serial cable and a GPS unit (I’ve done this before), and I am not going to sit around with a stopwatch; I opted to use Virtual ][ on my Mac. IMHO, Virtual ][ is the most authentic Apple II series emulator available. And thankfully Virtual ][ emulates a clock card.
For the HP-41C I'll be using my HP-41CX. The 41C and 41CX run at the same speed (the X in CX stands for eXpanded memory). Fortunately the CX (unlike the C) has a clock built-in.
I modified both programs to output to a printer and to leverage the clocks so that I could get an accurate reading of the performance. The output with source for the Apple II and HP-41C can be obtained here and here.
41 hours, 46 minutes, and 50 seconds. Confirmed, BASIC is indeed slow.
NOTE: I had to compute 1001 digits of Pi to get 1000 accurate digits.
NOTE: I did not use Ron's original 1980 program (where the first stone was thrown), but opted to use the optimized 1981 version instead. The Apple II would have taking a a beating in either case.
And the Winner Is...
Ron was right to boast. I have been unable to find any public Apple II results dated before 1982 to take back the title, Microcomputer/Calculator Class: Fastest Pi to 1000 digits.
To add insult to injury, in 1949, the hulking 30-ton giant ENIAC enslaved a team of four humans and computed Pi to 2037 digits in 70 wall clock hours over a three day holiday weekend. ENIAC throughout the computation also reversed the calculations to check for errors. The humans were leveraged as bio-mechanical RAM since ENIAC could only hold 200 decimal numbers in memory. Team ENIAC worked in shifts collecting, organizing, and feeding IBM punch cards. All compute, check, and human interaction took 70 hours[4, pp. 277-281, 731].
I’ll demonstrate later that 2037 digits in 70 hours is computationally the same as 1000 digits in ~17 hours. The Apple II in 1978 cannot best a 30-something-year-old cyborg?
Why? Did all three use the same algorithm? In the same way? Are there technological factors or limitations? Can the Apple II best the HP-41C and ENIAC?
To answer all those questions we need to better understand how to compute Pi to 1000 digits.
Computing Pi with arctan
There are many different methods to compute the digits of Pi by hand or with a computer. The arctan formulas since the second half of the 17th century have been the most popular for computing a relatively small (less than a million) number of Pi digits[5, pp. 65, 205]. So it is no surprise that Ron, Robert, and Team ENIAC used arctan as well. Specifically, Machin’s formula.
Machin’s arctan Formula
John Machin discovered and used this formula in 1706 to compute the first 100 digits of Pi[5, p. 72]. It has been the most widely used formula by Pi digit hunters until the early ’80′s when the state-of-the-art shifted to Gauss AGM and other exotic formulas[5, pp. 206]. However, Machin is still very popular because it’s easy to implement, fast, and has a low memory footprint. All three are important factors when using retro tech.
Many mathematical functions can be numerically computed with a series which is relatively easy to calculate. The arctan function possesses just such a series which was discovered by James Gregory in 1671[5, p. 67].
Combine the two and voilà! We have something that we can compute efficiently that will converge quickly.
The first series converges with log10(52) = 1.3979 decimal places per series term and the second converges with log10(2392) = 4.7568 decimal places per series term[5, p. 72].
The total number of terms required for each series for 1000 decimal Pi digits can be calculated with ⎡1000 / log10(52)⎤ = 716 and ⎡1000 / log10(2392)⎤ = 211 respectively. 927 terms total.
NOTE: The ⎡ ⎤ brackets is defined as round up to the next integer.
NOTE: The number of terms does not change if the numeric base for computation changes. IOW, it is constant. More on this later.
TIP: http://www.codecogs.com/latex/eqneditor.php was used to create all equations used in this article.
Go ahead, load up that last equation into Apple’s Integer BASIC and see what you get for Pi. 3? Don’t worry, you are are in good company; the biblical value of Pi = 3[6, p. 174] :-). Apple’s Floating Point BASIC doesn’t fair well either; at best, it would have computed 10 or 11 digits.
Clearly we are going to have to create some arbitrary-precision (AP) arithmetic code. The good news is that all we need is add, subtract, multiply, and divide. The simplest method for all four is to go old skool. That’s right, the same way you learned how to do this in elementary school is still applicable today. Simply represent each digit in an array and perform the operations the same way you would with pen and paper. Get out a pen and paper and try it!
It’s easier than it sounds. Although the adds and subtracts will be AP +/- AP, the multiply and divides will be AP ∗/÷ SP (single-precision). Now that’s even easier. AP:AP mult/div is harder and the subject of many papers and AP libraries.
For each operation, as your code loops from the least significant digit to the most significant digit (except divide, go the other way), the four arithmetic routines will perform two basic operations per iteration; add, sub, mult, div, and then a carry. I understand that this is an oversimplification, and if you want a clearer understanding, then I suggest you look at some code. However, the point I am trying to make is that if the number of digits doubles (e.g. computing Pi to 2000 digits), then the number of operations for AP:AP add/sub and SP:AP div/mult also doubles. It also works the other way. If I can half the number of digits, then I half my number of operations, and therefore half the time for basic arithmetic operations.
Covering Your Bases
The amount of effort it takes an 8-bit computer to add two base 10 numbers is the same effort as adding two base 256 (28) numbers. Therefore, if you can pack your 1000 digit base 10 arrays into fewer base 256 digits, then you can also reduce the number of operations and get a proportional speed up in time. Your new array size will = ⎡number of decimal digits / log10base⎤.
A few examples based on 1000 decimal digits of Pi:
|10||⎡1000 / log1010⎤||1000|
|28 (256)||⎡1000 / log10(28)⎤||416|
|216 (65536)||⎡1000 / log10(216)⎤||208|
|100,000||⎡1000 / log10100,000⎤||200|
|10,000,000,000||⎡1000 / log1010,000,000,000⎤||100|
The speed-up should be proportional to the array size assuming that a larger base will not have a disproportional overhead. E.g. The 6502, an 8-bit processor, will take more that twice the instructions to add two 16-bit numbers compared to two 8-bit numbers. However using 16-bit numbers on an 8-bit system is still more efficient that using base 10.
There is a caveat when not using base 10, 100, 1000, etc…; the end result must be converted back to base 10 if you are truly computing 1000 decimal digits of Pi. This adds a bit of overhead to the end of the computation, however it is only a fraction (e.g. 1/6th for base 216 to base 10) of the total computation.
I am assuming that ENIAC can leverage it’s double precision support and use base 1010 (10,000,000,000) and therefore only require an array of 100 base 1010 digits for 1000 decimal digits of Pi. Ron’s HP-41C program uses base 105 (100,000) requiring only 200 base 105 digit arrays. Robert’s Apple Pi uses base 10 requiring 1000 digit arrays.
This begins to shed some light on why Apple Pi performs so poorly. If Apple Pi could use base 100,000 or base 10,000,000,000 then its time would be reduced by a factor of 5 and 10 respectively. However, Integer BASIC and the 6502 are not designed to support this. ENIAC and the HP-41C are both designed and optimized for base 1010. As they should be; they are calculators.
Earlier I said, I’ll demonstrate later that 2037 digits in 70 hours is computationally the same as 1000 digits in ~17 hours.
That statement makes two assumptions. The first is that Team ENIAC used the Gregory expansion of Machin’s arctan formula, and the second is that their AP arithmetic scaled linearly. If that is the case, as is the case with Ron’s and Robert’s programs, and since the number of terms and the number of digits is a fixed ratio based on the desired number of digits; then if the number of desired digits double, then so does the number of terms to be computed and the size of the array to hold the digits, and therefore the amount of computation is double * double. IOW, the computational complexity is O(n2).
Simply put, if I double the digits the time increases by four. Therefore, the estimated time for ENIAC to compute 1000 digits is: 70 hr * (1000/2037)2 = 16.87 hr.
Back to Apple Pi
The number of terms to be computed will be constant regardless of the program’s numeric base. That leaves optimizing the arithmetic routines. Assuming that Robert’s BASIC program could be rewritten in base 256 (and it cannot without a lot of effort; effort that would not yield sufficient benefit), then the best speed up would be 41.78 hr * (416/1000) = 17.38 hr. Still deficient.
When writing fundamental AP routines it helps to have unsigned double precision integer support; if not, make your base the square-root of (the largest positive integer + 1), e.g. the largest HP-41C positive integer is 9,999,999,999, making the optimal base 100,000 since the HP-41C does not have double precision integer support. Because of the wacky integer support in Integer BASIC, the largest base would be 181 and not offer up enough performance.
Bottom line, Integer BASIC is most likely our culprit. In 1978 Robert had very few options. BASIC and assembly language were the dominate languages, and if I were in Robert’s shoes I would have picked BASIC too (Robert wrote the program in less than 40 hours). However, there was another option; Apple FORTH from Programma Consultants[7, p. 26]. And by 1980, when Ron first mentioned the Apple II’s poor performance, there were many more Apple languages to choose from, such as FORTRAN and Pascal.
Pi Day Rematch: Apple II vs. HP-41C
FORTH seems promising for a few reasons; one, it was available in 1978; two, it’s most like RPN, making a comparison to the 41C a bit more apples to apples; and three, with FORTH it will be easy to use a larger base (I used base 216) thus reducing the size of the arrays. Lastly, FORTH is fast.
Unfortunately I was unable to locate a copy of Apple FORTH from Programma Consultants, so I opted to use Mad Apple Forth (MAF) as a substitute. My program with timings can be located here. I didn’t have time to figure out how to access the clock card from MAF, so I used the Virtual ][ record function and then noted the time stamps on the output.
Start: 22:44:47 Print: 23:05:24 End: 23:09:43 Total Time: 00:24:56
The HP-41C and ENIAC were both schooled by the Apple II with FORTH.
HP-41C cries foul!
If you could only use what came in the box with your Apple II or HP-41C, then what would be the results? Fortunately the Apple II mini-assembler is in the box.
In 1982 Glen Bredon wrote an APPLE PI program in assembly that blows the battery door off the 41C with a time of 194 seconds. Game over! The Apple II wins again!
My 1496 sec. FORTH version doesn't hold an LED to assembly.
If you are looking for Glen's program, it is packaged with the Merlin Assembler.
HP-41C Machine Language
To be thorough I asked a friend of mine to compute Pi on the 41C with a 100% machine language (ML) program. He came back with 5 hours, 9 minutes. No question, the Apple II is king.
In my quest to best the 41C, I tried C as well. The C language hit the seen in the mid '80s making it easier to write and port over existing Pi programs.
Why not rub it in a bit?
|C Version||C Year||Base||Time (s)|
|Aztec K&R C 3.2b||1986||216||2180|
|Aztec K&R C 3.2b||1986||10||6746|
|cc65 ANSI C 2.12||2010||216||1764|
|cc65 ANSI C 2.13||2010||216||1514|
|cc65 ANSI C 2.13||2010||10||4241|
BASIC sucks. The HP-41C is truly awesome, and for a time bested the Apple II in the Fastest to Compute 1000 digits of Pi benchmark. But, in the end, the true potential of the Apple II smoked the 41C.
BTW, you'll never need more than 50 digits of Pi[5, p. 153].
On the other hand…
To continue reading this fascinating tale click here.
Some Great Pi Books
If you like Pi, especially computing Pi, then I recommend the following three books[4, 5, 6]:
- Knapp, Ron. PI TO 1,110 PLACES. PPC Calculator Journal Jun. 1980: 9-10.
- Knapp, Ron. FASTEST PI TO 1,000 PLACES. PPC Calculator Journal Aug.-Dec. 1981: 68-69
- Bishop, Robert J. APPLE PI. MICRO THE 6502 JOURNAL Aug.-Sep. 1978: 15-16.
- Berggren, Lennart, Jonathan M. Borwein, and Peter B. Borwein. Pi, a source book . 3rd ed. New York: Springer, 2004. Print.
- Arndt, Jörg, and Christoph Haenel. [Pi] – unleased . 2. ed. Berlin: Springer, 2001. Print.
- Beckmann, Petr. A history of [pi] (pi) . 4th ed. New York: Barnes & Noble, 19931971. Print.
- Apple Computer Inc. the best of the user group newsletters for 1978. Contact ’78 Dec. 1978: 26.
- Pi Day Deathmatch Poster. Dhemerae Ford (email@example.com).
- ENIAC photo (linked). Wikipedia.