0%

浅谈Java字节码

概述

Java字节码(bytecode)是一种低级的、类似于汇编语言的代码,用于指导JVM如何执行Java程序。每条字节码指令通常对应一项简单的操作,比如加载数据、运算、控制流转移等。字节码既不是机器代码,也不是高级语言代码,而是介于两者之间的中间表示。它是Java源代码在编译后的中间表示形式,使Java能够实现“一次编写,到处运行”的理念。

字节码和JVM的关系

JVM是执行Java字节码的虚拟机。它读取字节码,并将其翻译成特定操作系统上的机器码执行。这个过程可以是解释执行,即逐条翻译和执行字节码;也可以是编译执行,即通过即时编译器(JIT)将字节码编译成本地机器码后执行。

字节码指令主要类别及作用

  • 加载和存储指令: 处理数据的加载和存储操作(例如:aload_0, istore
  • 算术和逻辑指令: 执行基本的算术和逻辑运算(例如: ladd, fcmpl
  • 类型转换指令: 在不同类型之间转换数据(例如:i2b, d2i
  • 对象创建和操作指令: 创建和操作对象和数组(例如:new, putfield
  • 操作数栈管理指令: 直接操作操作数栈的指令(例如:swap, dup
  • 控制转移指令: 改变程序的控制流(例如:ifeq, goto
  • 方法调用和返回指令: 用于方法调用和返回(例如:invokevirtual, areturn
  • 同步指令: 支持多线程同步(例如:monitorenter, monitorexit

局部变量区(Local Variable Array)

局部变量区是一个数组结构,存储方法参数和局部变量。它为方法提供了存储和访问其局部变量的空间。在非静态方法中,this引用总是存储在局部变量区的索引0处,指向当前对象的实例。

操作数栈(Operand Stack)

操作数栈是一个后进先出(LIFO)的栈结构,用于存放指令的输入参数和输出结果。它在字节码指令执行时存储计算过程中的中间数据。

字节码示例

假设有如下Java实例方法:

// 示例方法
public int add(int a, int b) {
return a + b;
}

对应的字节码指令及其解释如下:

字节码指令解释操作数栈变化(执行前 -> 执行后)
iload_1加载第一个整型参数(a)到栈顶[] -> [a]
iload_2加载第二个整型参数(b)到栈顶[a] -> [a, b]
iadd执行整数加法[a, b] -> [result]
ireturn从当前方法返回整数[result] -> []

iload_1iload_2分别加载方法参数ab到操作数栈。这反映了Java实例方法中局部变量区索引的使用方式,其中索引0总是保留给this引用。

字节码列表

MnemonicOpcode
(in hex)
Opcode
(in binary)
Other bytes
[count]: [operand labels]
Stack
[before]→[after]
Description
aaload320011 0010arrayref, index → valueload onto the stack a reference from an array
aastore530101 0011arrayref, index, value →store a reference in an array
aconst_null10000 0001→ nullpush a null reference onto the stack
aload190001 10011: index→ objectrefload a reference onto the stack from a local variable #index
aload_02a0010 1010→ objectrefload a reference onto the stack from local variable 0
aload_12b0010 1011→ objectrefload a reference onto the stack from local variable 1
aload_22c0010 1100→ objectrefload a reference onto the stack from local variable 2
aload_32d0010 1101→ objectrefload a reference onto the stack from local variable 3
anewarraybd1011 11012: indexbyte1, indexbyte2count → arrayrefcreate a new array of references of length count and component type identified by the class reference index (indexbyte1 << 8 | indexbyte2) in the constant pool
areturnb01011 0000objectref → [empty]return a reference from a method
arraylengthbe1011 1110arrayref → lengthget the length of an array
astore3a0011 10101: indexobjectref →store a reference into a local variable #index
astore_04b0100 1011objectref →store a reference into local variable 0
astore_14c0100 1100objectref →store a reference into local variable 1
astore_24d0100 1101objectref →store a reference into local variable 2
astore_34e0100 1110objectref →store a reference into local variable 3
athrowbf1011 1111objectref → [empty], objectrefthrows an error or exception (notice that the rest of the stack is cleared, leaving only a reference to the Throwable)
baload330011 0011arrayref, index → valueload a byte or Boolean value from an array
bastore540101 0100arrayref, index, value →store a byte or Boolean value into an array
bipush100001 00001: byte→ valuepush a byte onto the stack as an integer value
breakpointca1100 1010reserved for breakpoints in Java debuggers; should not appear in any class file
caload340011 0100arrayref, index → valueload a char from an array
castore550101 0101arrayref, index, value →store a char into an array
checkcastc01100 00002: indexbyte1, indexbyte2objectref → objectrefchecks whether an objectref is of a certain type, the class reference of which is in the constant pool at index (indexbyte1 << 8 | indexbyte2)
d2f901001 0000value → resultconvert a double to a float
d2i8e1000 1110value → resultconvert a double to an int
d2l8f1000 1111value → resultconvert a double to a long
dadd630110 0011value1, value2 → resultadd two doubles
daload310011 0001arrayref, index → valueload a double from an array
dastore520101 0010arrayref, index, value →store a double into an array
dcmpg981001 1000value1, value2 → resultcompare two doubles, 1 on NaN
dcmpl971001 0111value1, value2 → resultcompare two doubles, -1 on NaN
dconst_00e0000 1110→ 0.0push the constant 0.0 (a double) onto the stack
dconst_10f0000 1111→ 1.0push the constant 1.0 (a double) onto the stack
ddiv6f0110 1111value1, value2 → resultdivide two doubles
dload180001 10001: index→ valueload a double value from a local variable #index
dload_0260010 0110→ valueload a double from local variable 0
dload_1270010 0111→ valueload a double from local variable 1
dload_2280010 1000→ valueload a double from local variable 2
dload_3290010 1001→ valueload a double from local variable 3
dmul6b0110 1011value1, value2 → resultmultiply two doubles
dneg770111 0111value → resultnegate a double
drem730111 0011value1, value2 → resultget the remainder from a division between two doubles
dreturnaf1010 1111value → [empty]return a double from a method
dstore390011 10011: indexvalue →store a double value into a local variable #index
dstore_0470100 0111value →store a double into local variable 0
dstore_1480100 1000value →store a double into local variable 1
dstore_2490100 1001value →store a double into local variable 2
dstore_34a0100 1010value →store a double into local variable 3
dsub670110 0111value1, value2 → resultsubtract a double from another
dup590101 1001value → value, valueduplicate the value on top of the stack
dup_x15a0101 1010value2, value1 → value1, value2, value1insert a copy of the top value into the stack two values from the top. value1 and value2 must not be of the type double or long.
dup_x25b0101 1011value3, value2, value1 → value1, value3, value2, value1insert a copy of the top value into the stack two (if value2 is double or long it takes up the entry of value3, too) or three values (if value2 is neither double nor long) from the top
dup25c0101 1100{value2, value1} → {value2, value1}, {value2, value1}duplicate top two stack words (two values, if value1 is not double nor long; a single value, if value1 is double or long)
dup2_x15d0101 1101value3, {value2, value1} → {value2, value1}, value3, {value2, value1}duplicate two words and insert beneath third word (see explanation above)
dup2_x25e0101 1110{value4, value3}, {value2, value1} → {value2, value1}, {value4, value3}, {value2, value1}duplicate two words and insert beneath fourth word
f2d8d1000 1101value → resultconvert a float to a double
f2i8b1000 1011value → resultconvert a float to an int
f2l8c1000 1100value → resultconvert a float to a long
fadd620110 0010value1, value2 → resultadd two floats
faload300011 0000arrayref, index → valueload a float from an array
fastore510101 0001arrayref, index, value →store a float in an array
fcmpg961001 0110value1, value2 → resultcompare two floats, 1 on NaN
fcmpl951001 0101value1, value2 → resultcompare two floats, -1 on NaN
fconst_00b0000 1011→ 0.0fpush 0.0f on the stack
fconst_10c0000 1100→ 1.0fpush 1.0f on the stack
fconst_20d0000 1101→ 2.0fpush 2.0f on the stack
fdiv6e0110 1110value1, value2 → resultdivide two floats
fload170001 01111: index→ valueload a float value from a local variable #index
fload_0220010 0010→ valueload a float value from local variable 0
fload_1230010 0011→ valueload a float value from local variable 1
fload_2240010 0100→ valueload a float value from local variable 2
fload_3250010 0101→ valueload a float value from local variable 3
fmul6a0110 1010value1, value2 → resultmultiply two floats
fneg760111 0110value → resultnegate a float
frem720111 0010value1, value2 → resultget the remainder from a division between two floats
freturnae1010 1110value → [empty]return a float
fstore380011 10001: indexvalue →store a float value into a local variable #index
fstore_0430100 0011value →store a float value into local variable 0
fstore_1440100 0100value →store a float value into local variable 1
fstore_2450100 0101value →store a float value into local variable 2
fstore_3460100 0110value →store a float value into local variable 3
fsub660110 0110value1, value2 → resultsubtract two floats
getfieldb41011 01002: indexbyte1, indexbyte2objectref → valueget a field value of an object objectref, where the field is identified by field reference in the constant pool index (indexbyte1 << 8 | indexbyte2)
getstaticb21011 00102: indexbyte1, indexbyte2→ valueget a static field value of a class, where the field is identified by field reference in the constant pool index (indexbyte1 << 8 | indexbyte2)
gotoa71010 01112: branchbyte1, branchbyte2[no change]goes to another instruction at branchoffset (signed short constructed from unsigned bytes branchbyte1 << 8 | branchbyte2)
goto_wc81100 10004: branchbyte1, branchbyte2, branchbyte3, branchbyte4[no change]goes to another instruction at branchoffset (signed int constructed from unsigned bytes branchbyte1 << 24 | branchbyte2 << 16 | branchbyte3 << 8 | branchbyte4)
i2b911001 0001value → resultconvert an int into a byte
i2c921001 0010value → resultconvert an int into a character
i2d871000 0111value → resultconvert an int into a double
i2f861000 0110value → resultconvert an int into a float
i2l851000 0101value → resultconvert an int into a long
i2s931001 0011value → resultconvert an int into a short
iadd600110 0000value1, value2 → resultadd two ints
iaload2e0010 1110arrayref, index → valueload an int from an array
iand7e0111 1110value1, value2 → resultperform a bitwise AND on two integers
iastore4f0100 1111arrayref, index, value →store an int into an array
iconst_m120000 0010→ -1load the int value −1 onto the stack
iconst_030000 0011→ 0load the int value 0 onto the stack
iconst_140000 0100→ 1load the int value 1 onto the stack
iconst_250000 0101→ 2load the int value 2 onto the stack
iconst_360000 0110→ 3load the int value 3 onto the stack
iconst_470000 0111→ 4load the int value 4 onto the stack
iconst_580000 1000→ 5load the int value 5 onto the stack
idiv6c0110 1100value1, value2 → resultdivide two integers
if_acmpeqa51010 01012: branchbyte1, branchbyte2value1, value2 →if references are equal, branch to instruction at branchoffset (signed short constructed from unsigned bytes branchbyte1 << 8 | branchbyte2)
if_acmpnea61010 01102: branchbyte1, branchbyte2value1, value2 →if references are not equal, branch to instruction at branchoffset (signed short constructed from unsigned bytes branchbyte1 << 8 | branchbyte2)
if_icmpeq9f1001 11112: branchbyte1, branchbyte2value1, value2 →if ints are equal, branch to instruction at branchoffset (signed short constructed from unsigned bytes branchbyte1 << 8 | branchbyte2)
if_icmpgea21010 00102: branchbyte1, branchbyte2value1, value2 →if value1 is greater than or equal to value2, branch to instruction at branchoffset (signed short constructed from unsigned bytes branchbyte1 << 8 | branchbyte2)
if_icmpgta31010 00112: branchbyte1, branchbyte2value1, value2 →if value1 is greater than value2, branch to instruction at branchoffset (signed short constructed from unsigned bytes branchbyte1 << 8 | branchbyte2)
if_icmplea41010 01002: branchbyte1, branchbyte2value1, value2 →if value1 is less than or equal to value2, branch to instruction at branchoffset (signed short constructed from unsigned bytes branchbyte1 << 8 | branchbyte2)
if_icmplta11010 00012: branchbyte1, branchbyte2value1, value2 →if value1 is less than value2, branch to instruction at branchoffset (signed short constructed from unsigned bytes branchbyte1 << 8 | branchbyte2)
if_icmpnea01010 00002: branchbyte1, branchbyte2value1, value2 →if ints are not equal, branch to instruction at branchoffset (signed short constructed from unsigned bytes branchbyte1 << 8 | branchbyte2)
ifeq991001 10012: branchbyte1, branchbyte2value →if value is 0, branch to instruction at branchoffset (signed short constructed from unsigned bytes branchbyte1 << 8 | branchbyte2)
ifge9c1001 11002: branchbyte1, branchbyte2value →if value is greater than or equal to 0, branch to instruction at branchoffset (signed short constructed from unsigned bytes branchbyte1 << 8 | branchbyte2)
ifgt9d1001 11012: branchbyte1, branchbyte2value →if value is greater than 0, branch to instruction at branchoffset (signed short constructed from unsigned bytes branchbyte1 << 8 | branchbyte2)
ifle9e1001 11102: branchbyte1, branchbyte2value →if value is less than or equal to 0, branch to instruction at branchoffset (signed short constructed from unsigned bytes branchbyte1 << 8 | branchbyte2)
iflt9b1001 10112: branchbyte1, branchbyte2value →if value is less than 0, branch to instruction at branchoffset (signed short constructed from unsigned bytes branchbyte1 << 8 | branchbyte2)
ifne9a1001 10102: branchbyte1, branchbyte2value →if value is not 0, branch to instruction at branchoffset (signed short constructed from unsigned bytes branchbyte1 << 8 | branchbyte2)
ifnonnullc71100 01112: branchbyte1, branchbyte2value →if value is not null, branch to instruction at branchoffset (signed short constructed from unsigned bytes branchbyte1 << 8 | branchbyte2)
ifnullc61100 01102: branchbyte1, branchbyte2value →if value is null, branch to instruction at branchoffset (signed short constructed from unsigned bytes branchbyte1 << 8 | branchbyte2)
iinc841000 01002: index, const[No change]increment local variable #index by signed byte const
iload150001 01011: index→ valueload an int value from a local variable #index
iload_01a0001 1010→ valueload an int value from local variable 0
iload_11b0001 1011→ valueload an int value from local variable 1
iload_21c0001 1100→ valueload an int value from local variable 2
iload_31d0001 1101→ valueload an int value from local variable 3
impdep1fe1111 1110reserved for implementation-dependent operations within debuggers; should not appear in any class file
impdep2ff1111 1111reserved for implementation-dependent operations within debuggers; should not appear in any class file
imul680110 1000value1, value2 → resultmultiply two integers
ineg740111 0100value → resultnegate int
instanceofc11100 00012: indexbyte1, indexbyte2objectref → resultdetermines if an object objectref is of a given type, identified by class reference index in constant pool (indexbyte1 << 8 | indexbyte2)
invokedynamicba1011 10104: indexbyte1, indexbyte2, 0, 0[arg1, arg2, …] → resultinvokes a dynamic method and puts the result on the stack (might be void); the method is identified by method reference index in constant pool (indexbyte1 << 8 | indexbyte2)
invokeinterfaceb91011 10014: indexbyte1, indexbyte2, count, 0objectref, [arg1, arg2, …] → resultinvokes an interface method on object objectref and puts the result on the stack (might be void); the interface method is identified by method reference index in constant pool (indexbyte1 << 8 | indexbyte2)
invokespecialb71011 01112: indexbyte1, indexbyte2objectref, [arg1, arg2, …] → resultinvoke instance method on object objectref and puts the result on the stack (might be void); the method is identified by method reference index in constant pool (indexbyte1 << 8 | indexbyte2)
invokestaticb81011 10002: indexbyte1, indexbyte2[arg1, arg2, …] → resultinvoke a static method and puts the result on the stack (might be void); the method is identified by method reference index in constant pool (indexbyte1 << 8 | indexbyte2)
invokevirtualb61011 01102: indexbyte1, indexbyte2objectref, [arg1, arg2, …] → resultinvoke virtual method on object objectref and puts the result on the stack (might be void); the method is identified by method reference index in constant pool (indexbyte1 << 8 | indexbyte2)
ior801000 0000value1, value2 → resultbitwise int OR
irem700111 0000value1, value2 → resultlogical int remainder
ireturnac1010 1100value → [empty]return an integer from a method
ishl780111 1000value1, value2 → resultint shift left
ishr7a0111 1010value1, value2 → resultint arithmetic shift right
istore360011 01101: indexvalue →store int value into variable #index
istore_03b0011 1011value →store int value into variable 0
istore_13c0011 1100value →store int value into variable 1
istore_23d0011 1101value →store int value into variable 2
istore_33e0011 1110value →store int value into variable 3
isub640110 0100value1, value2 → resultint subtract
iushr7c0111 1100value1, value2 → resultint logical shift right
ixor821000 0010value1, value2 → resultint xor
jsr†a81010 10002: branchbyte1, branchbyte2→ addressjump to subroutine at branchoffset (signed short constructed from unsigned bytes branchbyte1 << 8 | branchbyte2) and place the return address on the stack
jsr_w†c91100 10014: branchbyte1, branchbyte2, branchbyte3, branchbyte4→ addressjump to subroutine at branchoffset (signed int constructed from unsigned bytes branchbyte1 << 24 | branchbyte2 << 16 | branchbyte3 << 8 | branchbyte4) and place the return address on the stack
l2d8a1000 1010value → resultconvert a long to a double
l2f891000 1001value → resultconvert a long to a float
l2i881000 1000value → resultconvert a long to a int
ladd610110 0001value1, value2 → resultadd two longs
laload2f0010 1111arrayref, index → valueload a long from an array
land7f0111 1111value1, value2 → resultbitwise AND of two longs
lastore500101 0000arrayref, index, value →store a long to an array
lcmp941001 0100value1, value2 → resultpush 0 if the two longs are the same, 1 if value1 is greater than value2, -1 otherwise
lconst_090000 1001→ 0Lpush 0L (the number zero with type long) onto the stack
lconst_10a0000 1010→ 1Lpush 1L (the number one with type long) onto the stack
ldc120001 00101: index→ valuepush a constant #index from a constant pool (String, int, float, Class, java.lang.invoke.MethodType, java.lang.invoke.MethodHandle, or a dynamically-computed constant) onto the stack
ldc_w130001 00112: indexbyte1, indexbyte2→ valuepush a constant #index from a constant pool (String, int, float, Class, java.lang.invoke.MethodType, java.lang.invoke.MethodHandle, or a dynamically-computed constant) onto the stack (wide index is constructed as indexbyte1 << 8 | indexbyte2)
ldc2_w140001 01002: indexbyte1, indexbyte2→ valuepush a constant #index from a constant pool (double, long, or a dynamically-computed constant) onto the stack (wide index is constructed as indexbyte1 << 8 | indexbyte2)
ldiv6d0110 1101value1, value2 → resultdivide two longs
lload160001 01101: index→ valueload a long value from a local variable #index
lload_01e0001 1110→ valueload a long value from a local variable 0
lload_11f0001 1111→ valueload a long value from a local variable 1
lload_2200010 0000→ valueload a long value from a local variable 2
lload_3210010 0001→ valueload a long value from a local variable 3
lmul690110 1001value1, value2 → resultmultiply two longs
lneg750111 0101value → resultnegate a long
lookupswitchab1010 10118+: <0–3 bytes padding>, defaultbyte1, defaultbyte2, defaultbyte3, defaultbyte4, npairs1, npairs2, npairs3, npairs4, match-offset pairs…key →a target address is looked up from a table using a key and execution continues from the instruction at that address
lor811000 0001value1, value2 → resultbitwise OR of two longs
lrem710111 0001value1, value2 → resultremainder of division of two longs
lreturnad1010 1101value → [empty]return a long value
lshl790111 1001value1, value2 → resultbitwise shift left of a long value1 by int value2 positions
lshr7b0111 1011value1, value2 → resultbitwise shift right of a long value1 by int value2 positions
lstore370011 01111: indexvalue →store a long value in a local variable #index
lstore_03f0011 1111value →store a long value in a local variable 0
lstore_1400100 0000value →store a long value in a local variable 1
lstore_2410100 0001value →store a long value in a local variable 2
lstore_3420100 0010value →store a long value in a local variable 3
lsub650110 0101value1, value2 → resultsubtract two longs
lushr7d0111 1101value1, value2 → resultbitwise shift right of a long value1 by int value2 positions, unsigned
lxor831000 0011value1, value2 → resultbitwise XOR of two longs
monitorenterc21100 0010objectref →enter monitor for object (“grab the lock” – start of synchronized() section)
monitorexitc31100 0011objectref →exit monitor for object (“release the lock” – end of synchronized() section)
multianewarrayc51100 01013: indexbyte1, indexbyte2, dimensionscount1, [count2,…] → arrayrefcreate a new array of dimensions dimensions of type identified by class reference in constant pool index (indexbyte1 << 8 | indexbyte2); the sizes of each dimension is identified by count1, [count2, etc.]
newbb1011 10112: indexbyte1, indexbyte2→ objectrefcreate new object of type identified by class reference in constant pool index (indexbyte1 << 8 | indexbyte2)
newarraybc1011 11001: atypecount → arrayrefcreate new array with count elements of primitive type identified by atype
nop00000 0000[No change]perform no operation
pop570101 0111value →discard the top value on the stack
pop2580101 1000{value2, value1} →discard the top two values on the stack (or one value, if it is a double or long)
putfieldb51011 01012: indexbyte1, indexbyte2objectref, value →set field to value in an object objectref, where the field is identified by a field reference index in constant pool (indexbyte1 << 8 | indexbyte2)
putstaticb31011 00112: indexbyte1, indexbyte2value →set static field to value in a class, where the field is identified by a field reference index in constant pool (indexbyte1 << 8 | indexbyte2)
ret†a91010 10011: index[No change]continue execution from address taken from a local variable #index (the asymmetry with jsr is intentional)
returnb11011 0001→ [empty]return void from method
saload350011 0101arrayref, index → valueload short from array
sastore560101 0110arrayref, index, value →store short to array
sipush110001 00012: byte1, byte2→ valuepush a short onto the stack as an integer value
swap5f0101 1111value2, value1 → value1, value2swaps two top words on the stack (note that value1 and value2 must not be double or long)
tableswitchaa1010 101016+: [0–3 bytes padding], defaultbyte1, defaultbyte2, defaultbyte3, defaultbyte4, lowbyte1, lowbyte2, lowbyte3, lowbyte4, highbyte1, highbyte2, highbyte3, highbyte4, jump offsets…index →continue execution from an address in the table at offset index
widec41100 01003/5: opcode, indexbyte1, indexbyte2 or iinc, indexbyte1, indexbyte2, countbyte1, countbyte2[same as for corresponding instructions]execute opcode, where opcode is either iload, fload, aload, lload, dload, istore, fstore, astore, lstore, dstore, or ret, but assume the index is 16 bit; or execute iinc, where the index is 16 bits and the constant to increment by is a signed 16 bit short
(no name)cb-fdthese values are currently unassigned for opcodes and are reserved for future use

†Deprecated in Java 7 (major class version 51). They will not appear in class files generated from a compiler newer than that. ret is not explicitly deprecated, but useless without the jsr and jsr_w opcodes.

结论

Java字节码是JVM执行Java程序的基础。通过深入理解操作数栈、局部变量区以及常用的字节码指令,开发者可以更好地理解Java程序的运行原理,甚至进行字节码层面的生成及优化。

参考资料

  • List of Java bytecode instructions