数组下标访问背后隐含的逻辑

The logic behind array subscript access

对于数组而言,下标运算是随机读写的一种方式,也是最常用的方式。但是有很多教材(尤其是国内教材)一上来就说数组名就是指针,这是不对的。而且对于数组的下标访问背后是有一套规则的,熟悉这些规则可以在一些复杂语义的情况下分析出代码的实际含义。

假如现在我们有数组x,含有6个int型的元素:

1
int x[6]={1,2,3,4,5,6};

对于x,我们可以对其进行下标访问:

1
x[2]==3;

有一个问题是:在我们进行下标操作的时候,背后究竟执行了什么?它是如何访问到数组x下标为2(第三个)元素的。
或许你也见过另外一种奇葩的数组下标访问方式:

1
2[x]==x[2];

ISO/IEC 14882:2014(E): The subscript operator [] is interpreted in such a way that E1[E2] is identical to *((E1)+(E2)).Because of the conversion rules that apply to +, if E1 is an array and E2 an integer, then E1[E2] refers to the E2-th member of E1. Therefore, despite its asymmetric appearance, subscripting is a commutative operation.

意味着数组的下标访问是以+*(解引用)操作组合来说实现的。
当我们x[2]的时候,会被转换成*(x+2)
由加法交换律可得:

1
*(x+2)==*(2+x);

即:

1
x[2]==2[x];

进而推广可得:

1
2
const int j=2;
x[j]==*(x[0]+j)==*(x+j)==*(j+x)==j[x]

但是这里还有一个问题是:数组名是什么。看起来数组名像是指向数组首个元素的指针,但是这个不正确的(或者说是不严谨的)。
先说结论:当数组名出现在一个表达式中时,它被转换为一个指向首个元素(或多维数组的子对象)的指针。

1
int x[3][5]={{1,2,3,4,5},{6,7,8,9,10},{11,12,13,14,15}}; // Here x is a 3 × 5 array of integers.

可以从中间代码的角度来分析一下此处发生的事情:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
// 数组的初始化
@_ZZ4mainE1a = private unnamed_addr constant [3 x [5 x i32]] [[5 x i32] [i32 1, i32 2, i32 3, i32 4, i32 5], [5 x i32] [i32 6, i32 7, i32 8, i32 9, i32 10], [5 x i32] zeroinitializer], align 16
// 分配存储空间
%3 = alloca i32, align 4
%4 = alloca i32, align 4
%5 = alloca i8**, align 8
%6 = alloca [3 x [5 x i32]], align 16
%7 = alloca [5 x i32]*, align 8
store i32 0, i32* %3, align 4
store i32 %0, i32* %4, align 4
store i8** %1, i8*** %5, align 8
// 数组转换为指针
%8 = bitcast [3 x [5 x i32]]* %6 to i8*
// 调用初始化将初始化列表中的值写入到上面分配的地址空间
call void @llvm.memcpy.p0i8.p0i8.i64(i8* %8, i8* bitcast ([3 x [5 x i32]]* @_ZZ4mainE1a to i8*), i64 60, i32 16, i1 false)

可以看到,都是一层层的指针。
当我们将x赋值给一个指针对象时:

1
auto xp=x;

其中间代码为:

1
2
%9 = getelementptr inbounds [3 x [5 x i32]], [3 x [5 x i32]]* %6, i32 0, i32 0
store [5 x i32]* %9, [5 x i32]** %7, align 8

When x appears in an expression, it is converted to a pointer to (the first of three) five-membered arrays of integers.

x出现在表达式中会被转换为指向x[0]对象(也就是一个含有五个整型元素的数组)的指针。

In the expression x[i] which is equivalent to *(x+i), x is first converted to a pointer as described; then x+i is converted to the type of x, which involves multiplying i by the length of the object to which the pointer points, namely five integer objects. The results are added and indirection applied to yield an array (of five integers), which in turn is converted to a pointer to the first of the integers.

这意味这数组名不是指针,而是代表一个数组对象,但是其可以在表达式中被隐式地转换为指向首个元素(对象)的指针。
一致性原则(consistent rule)可以将这个概念推广至多维数组(multidimensional arrays).

If there is another subscript the same argument applies again; this time the result is an integer.

即:

1
2
3
4
5
&x==&x[0];
&x[0]==&x[0][0]
x==&x[0][0]; // x is a int **
**x==x[0][0];
&**x==&x[0][0]

If E is an n-dimensional array of rank i×j×…×k,then E appearing in an expression that is subject to the array-to-pointer conversion (4.2) is converted to a pointer to an (n −1)-dimensional array with rank j×…×k. If the * operator, either explicitly or implicitly as a result of subscripting, is applied to this pointer, the result is the pointed-to (n − 1)-dimensional array, which itself is immediately converted into a pointer.

1
2
3
4
5
const int i=2,j=3,k=4;
int E[i][j][k]; // declares a three-dimensional array of integers, with rank 2×3×4.
// E[1]是指向E[1][0][0]~E[1][j-1][k-1]范围的数组对象;
int *firstElementPtr=&**a;
E[1][2][3]==*(firstElementPtr+(1*j*k)+(2*k)+3);
全文完,若有不足之处请评论指正。

微信扫描二维码,关注我的公众号。

本文标题:数组下标访问背后隐含的逻辑
文章作者:查利鹏
发布时间:2017/01/16 00:59
本文字数:1.8k 字
原始链接:https://imzlp.com/posts/20449/
许可协议: CC BY-NC-SA 4.0
文章禁止全文转载,摘要转发请保留原文链接及作者信息,谢谢!
您的捐赠将鼓励我继续创作!