Let me present two lists of programming languages.
| List One
| List Two |
| COBOL (1959) |
BASIC (1964) |
| Fortran (1957) |
C (1973) |
| FoxPro (?) |
C# (2001) |
| Lua (1993) |
Java (1995) |
| MATLAB (late 1970s) |
PHP (1995) |
| PL/I (1964) |
Python (1991) |
| RPG (1959) |
Ruby (1995) |
| Smalltalk (1972) |
|
|
|
Complete list on Wikipedia
Unfortunately SEO dictates that the title of this post already describes these lists: List one is the majority of the 1-indexed programming languages I could dig up and list two is a sampling of 0 indexed languages. You see, last night a good friend of mine asserted that the only reason modern languages use 0 indexed arrays is convention; they are based on old languages which were more hardware-focused. (Chris, feel free to correct me if I’m not phrasing that fairly). Unfortunately (for him), this seems incorrect.
Old languages were 1 indexed
Look at the years in the table above. It seems like 1 indexing was rather popular during the early days of language development. COBOL and Fortran are the among the oldest languages around and thanks to their prevalence in finance, government, and military are still in surprisingly wide use today. But Algol is more fun to focus on. If you aren’t familiar with language history, look at the Wikipedia entry on languages derived from Algol and notice how every language in list 2 (the 0-indexed list) is derived from ALGOL (a 1-indexed language).
Clearly, the convention early on was 1-indexing which is not surprising. Up until this point people tended to think of 1 as the first item in a set. But as languages evolved, there came later was a barrage of 0-indexed languages.
Dijkstra’s opinion
If you are unfamilar with Edsger Dijstra then: a) you make me sad and b) do some quick googling. Dijkstra is credited for inventing many of the core concepts of software development, including abstraction, using loops, Dijkstra’s algorithm (shortest-path), and much more. He even wrote a paper on this matter: which is based on the premise that in programming we often have to operate on a sequence of natural numbers. If you disagree with that, stop reading. His opinion explained:
Start with the sequence of numbers 2 through 12. (2,3,4,5,6,7,8,9,10,11,12). Dijkstra shows the 4 ways to denote these sequences:
a) 2 ≤ i < 13
b) 1 < i ≤ 12
c) 2 ≤ i ≤ 12
d) 1 < i < 13
In the first two options), the difference between the min and the max is equal to the number of values in the range. (13 - 2 == 12 - 1 == 11) and two adjacent number ranges (1,2,3 and 4,5,6) are expressed in a way that the upper bound of the first range will be equal to the lower bound of the second range.
Dijkstra also suggests that in computing it is common to get sequences that consist of only the smallest natural number (0) or sequences that are empty. Sequences starting at 0 would be expressed as -1 < i for methods b and d above. This forces the use nonnatural numbers even though we are only describing natural numbers. With c, an empty set would be represented as: 0 ≤ i ≤ -1 which is hard to support.
So, in Dijkstra’s opinion, a is the only way to cleanly describe sets of natural numbers.
Practical Concerns
Another angle of approach is the examination of what programming in 1-indexed languages looks like:
Array positioning
Consider the common task of copying an array into a specific spot of a larger array. Normally this would look like:
# copy src into dest at position p
copy(src, dest, p) {
for i from 0 to src.length -1
dest[p + i] vect[i]
}
In 1-index land, the extra index needs to be accounted for:
# same thing with 1-indexing:
copy(src, dest, p) {
for i from 1 to src.length
dest[p + i - 1] = vect[i]
}
Hashing
Consider the common task of creating a hash table to optimize lookups. Again, if you don’t believe this is a common task, please stop reading.
To keep it practical, consider a system that keeps track of individual users. Each user has a unique, numeric user ID and a name. This can be represented as:
user {
int id;
string name;
}
To facilitate lookups, we devise a very simple hashing function:
function hash(int input) {
return input % 5;
}
Create an array to store these hashed users
Array user_lookup[5];
Each item in the array is a linked list containing all the users whose ID mapped to that value. So we add users by doing: user_lookup[hash(user.id)].add(user);
However, the first user to come through with id 500 breaks things because their value comes to 0. So now, we have to add a +1 somewhere compensate.
C-style pointer math
One common argument is that in C, arrays are pointers to memory.
So given the following:
int array[5];
The statement: a[3] is actually translated to *(a + 3 * sizeof(int))
Adding accounting for 1 indexing would turn that into:
*(a + 3* sizeof(int) + 1) which gets back to the elegance and unnecessary work arguments suggested above.
Conclusion
Originally, I had hoped to do something clever with structs and unions in C to show that 1 indexing is broken. However, the best I could have achieved with that is proving that it is broken in C which doesn’t actually prove anything about programming a whole. Instead, Dijkstra’s argument on the elegance of 0-indexing coupled with practical examples apply to a much broader range of languages.