Sunday, March 2, 2014

main() - how a program starts (linux, C/C++)

Whenever you execute a program (either from a shell or replace a forked child process image using one of the functions from exec() family) - eventually the system call execve(2) is invoked.
The prototype of this system call is int execve(const char *filename, char *const argv[], char *const envp[]);

This system call, among other things (like verifying execute permission on the filename, setuid etc.), figures the value of argc using argv and copies argc, argv and argp arguments on new user stack along with process .data and .text. It stores argc into %rsp, argv[0] in LP_SIZE(%rsp)argv[argc] = null in (LP_SIZE*argc)(%rsp), similary envp[0] in (LP_SIZE*(argc+1))(%rsp) ... null. Where LP_SIZE is the size of long pointer in bytes.

NOTE: both argv and argp are null terminated arrays.

The arguments to function int main(int argc, char *argv[], char *envp[]); is usually implementation defined and specified by platforms ABI. C99 does neither bless or forbid envp argument to main function.

On linux executable of a program is created according to ELF specifications. Typically the ELF is implemented such that, some glibc wrapper functions are called before main to make sure that argc is initialized from stack (these functions typically involve _start, __libc_csu_init, __libc_start_main etc.).

You could experiment your binary files produced on linux using the binutils (notably readelf and objdump utilities among others).

Following is an example where a simple C program prints the command line arguments and environment variables.

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
#include <stdio.h>
 
int main(int argc, char *argv[], char *envp[]) {
    int i;
     
    // dump args
    for (i = 0; i < argc; ++i) {
        printf("%d = %s\n", i, argv[i]);
    }
 
    // dump environment
    for (i = 0; envp[i] != NULL; ++i) {
        printf ("%s\n", envp[i]);
    }
 
    return 0;
}

Assuming that above programs filename is argc.c, it can be compiled using command
gcc -o argc argc.c
This should give you an executable file argc (an executable ELF object).

Now you can disassemble the executable .sections of argc using objdump utility that comes with binutils package.
objdump -d argc > argc.objdump
will give you following (or equivalent depending on architecture you're working on) in file argc.objdump.

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
argc:     file format elf64-x86-64
 
 
Disassembly of section .init:
 
0000000000400418 <_init>:
  400418: 48 83 ec 08           sub    $0x8,%rsp
  40041c: 48 8b 05 d5 0b 20 00  mov    0x200bd5(%rip),%rax        # 600ff8 <_DYNAMIC+0x1d0>
  400423: 48 85 c0              test   %rax,%rax
  400426: 74 05                 je     40042d <_init+0x15>
  400428: e8 53 00 00 00        callq  400480 <__gmon_start__@plt>
  40042d: 48 83 c4 08           add    $0x8,%rsp
  400431: c3                    retq  
 
Disassembly of section .plt:
 
0000000000400440 <puts@plt-0x10>:
  400440: ff 35 c2 0b 20 00     pushq  0x200bc2(%rip)        # 601008 <_GLOBAL_OFFSET_TABLE_+0x8>
  400446: ff 25 c4 0b 20 00     jmpq   *0x200bc4(%rip)        # 601010 <_GLOBAL_OFFSET_TABLE_+0x10>
  40044c: 0f 1f 40 00           nopl   0x0(%rax)
 
0000000000400450 <puts@plt>:
  400450: ff 25 c2 0b 20 00     jmpq   *0x200bc2(%rip)        # 601018 <_GLOBAL_OFFSET_TABLE_+0x18>
  400456: 68 00 00 00 00        pushq  $0x0
  40045b: e9 e0 ff ff ff        jmpq   400440 <_init+0x28>
 
0000000000400460 <printf@plt>:
  400460: ff 25 ba 0b 20 00     jmpq   *0x200bba(%rip)        # 601020 <_GLOBAL_OFFSET_TABLE_+0x20>
  400466: 68 01 00 00 00        pushq  $0x1
  40046b: e9 d0 ff ff ff        jmpq   400440 <_init+0x28>
 
0000000000400470 <__libc_start_main@plt>:
  400470: ff 25 b2 0b 20 00     jmpq   *0x200bb2(%rip)        # 601028 <_GLOBAL_OFFSET_TABLE_+0x28>
  400476: 68 02 00 00 00        pushq  $0x2
  40047b: e9 c0 ff ff ff        jmpq   400440 <_init+0x28>
 
0000000000400480 <__gmon_start__@plt>:
  400480: ff 25 aa 0b 20 00     jmpq   *0x200baa(%rip)        # 601030 <_GLOBAL_OFFSET_TABLE_+0x30>
  400486: 68 03 00 00 00        pushq  $0x3
  40048b: e9 b0 ff ff ff        jmpq   400440 <_init+0x28>
 
Disassembly of section .text:
 
0000000000400490 <_start>:
  400490: 31 ed                 xor    %ebp,%ebp
  400492: 49 89 d1              mov    %rdx,%r9
  400495: 5e                    pop    %rsi
  400496: 48 89 e2              mov    %rsp,%rdx
  400499: 48 83 e4 f0           and    $0xfffffffffffffff0,%rsp
  40049d: 50                    push   %rax
  40049e: 54                    push   %rsp
  40049f: 49 c7 c0 a0 06 40 00  mov    $0x4006a0,%r8
  4004a6: 48 c7 c1 30 06 40 00  mov    $0x400630,%rcx
  4004ad: 48 c7 c7 80 05 40 00  mov    $0x400580,%rdi
  4004b4: e8 b7 ff ff ff        callq  400470 <__libc_start_main@plt>
  4004b9: f4                    hlt   
  4004ba: 66 90                 xchg   %ax,%ax
  4004bc: 0f 1f 40 00           nopl   0x0(%rax)
 
00000000004004c0 <deregister_tm_clones>:
  4004c0: b8 47 10 60 00        mov    $0x601047,%eax
  4004c5: 55                    push   %rbp
  4004c6: 48 2d 40 10 60 00     sub    $0x601040,%rax
  4004cc: 48 83 f8 0e           cmp    $0xe,%rax
  4004d0: 48 89 e5              mov    %rsp,%rbp
  4004d3: 77 02                 ja     4004d7 <deregister_tm_clones+0x17>
  4004d5: 5d                    pop    %rbp
  4004d6: c3                    retq  
  4004d7: b8 00 00 00 00        mov    $0x0,%eax
  4004dc: 48 85 c0              test   %rax,%rax
  4004df: 74 f4                 je     4004d5 <deregister_tm_clones+0x15>
  4004e1: 5d                    pop    %rbp
  4004e2: bf 40 10 60 00        mov    $0x601040,%edi
  4004e7: ff e0                 jmpq   *%rax
  4004e9: 0f 1f 80 00 00 00 00  nopl   0x0(%rax)
 
00000000004004f0 <register_tm_clones>:
  4004f0: b8 40 10 60 00        mov    $0x601040,%eax
  4004f5: 55                    push   %rbp
  4004f6: 48 2d 40 10 60 00     sub    $0x601040,%rax
  4004fc: 48 c1 f8 03           sar    $0x3,%rax
  400500: 48 89 e5              mov    %rsp,%rbp
  400503: 48 89 c2              mov    %rax,%rdx
  400506: 48 c1 ea 3f           shr    $0x3f,%rdx
  40050a: 48 01 d0              add    %rdx,%rax
  40050d: 48 d1 f8              sar    %rax
  400510: 75 02                 jne    400514 <register_tm_clones+0x24>
  400512: 5d                    pop    %rbp
  400513: c3                    retq  
  400514: ba 00 00 00 00        mov    $0x0,%edx
  400519: 48 85 d2              test   %rdx,%rdx
  40051c: 74 f4                 je     400512 <register_tm_clones+0x22>
  40051e: 5d                    pop    %rbp
  40051f: 48 89 c6              mov    %rax,%rsi
  400522: bf 40 10 60 00        mov    $0x601040,%edi
  400527: ff e2                 jmpq   *%rdx
  400529: 0f 1f 80 00 00 00 00  nopl   0x0(%rax)
 
0000000000400530 <__do_global_dtors_aux>:
  400530: 80 3d 05 0b 20 00 00  cmpb   $0x0,0x200b05(%rip)        # 60103c <_edata>
  400537: 75 11                 jne    40054a <__do_global_dtors_aux+0x1a>
  400539: 55                    push   %rbp
  40053a: 48 89 e5              mov    %rsp,%rbp
  40053d: e8 7e ff ff ff        callq  4004c0 <deregister_tm_clones>
  400542: 5d                    pop    %rbp
  400543: c6 05 f2 0a 20 00 01  movb   $0x1,0x200af2(%rip)        # 60103c <_edata>
  40054a: f3 c3                 repz retq
  40054c: 0f 1f 40 00           nopl   0x0(%rax)
 
0000000000400550 <frame_dummy>:
  400550: 48 83 3d c8 08 20 00  cmpq   $0x0,0x2008c8(%rip)        # 600e20 <__jcr_end__>
  400557: 00
  400558: 74 1e                 je     400578 <frame_dummy+0x28>
  40055a: b8 00 00 00 00        mov    $0x0,%eax
  40055f: 48 85 c0              test   %rax,%rax
  400562: 74 14                 je     400578 <frame_dummy+0x28>
  400564: 55                    push   %rbp
  400565: bf 20 0e 60 00        mov    $0x600e20,%edi
  40056a: 48 89 e5              mov    %rsp,%rbp
  40056d: ff d0                 callq  *%rax
  40056f: 5d                    pop    %rbp
  400570: e9 7b ff ff ff        jmpq   4004f0 <register_tm_clones>
  400575: 0f 1f 00              nopl   (%rax)
  400578: e9 73 ff ff ff        jmpq   4004f0 <register_tm_clones>
  40057d: 0f 1f 00              nopl   (%rax)
 
0000000000400580 <main>:
  400580: 55                    push   %rbp
  400581: 48 89 e5              mov    %rsp,%rbp
  400584: 48 83 ec 30           sub    $0x30,%rsp
  400588: 89 7d ec              mov    %edi,-0x14(%rbp)
  40058b: 48 89 75 e0           mov    %rsi,-0x20(%rbp)
  40058f: 48 89 55 d8           mov    %rdx,-0x28(%rbp)
  400593: c7 45 fc 00 00 00 00  movl   $0x0,-0x4(%rbp)
  40059a: eb 2f                 jmp    4005cb <main+0x4b>
  40059c: 8b 45 fc              mov    -0x4(%rbp),%eax
  40059f: 48 98                 cltq  
  4005a1: 48 8d 14 c5 00 00 00  lea    0x0(,%rax,8),%rdx
  4005a8: 00
  4005a9: 48 8b 45 e0           mov    -0x20(%rbp),%rax
  4005ad: 48 01 d0              add    %rdx,%rax
  4005b0: 48 8b 10              mov    (%rax),%rdx
  4005b3: 8b 45 fc              mov    -0x4(%rbp),%eax
  4005b6: 89 c6                 mov    %eax,%esi
  4005b8: bf c0 06 40 00        mov    $0x4006c0,%edi
  4005bd: b8 00 00 00 00        mov    $0x0,%eax
  4005c2: e8 99 fe ff ff        callq  400460 <printf@plt>
  4005c7: 83 45 fc 01           addl   $0x1,-0x4(%rbp)
  4005cb: 8b 45 fc              mov    -0x4(%rbp),%eax
  4005ce: 3b 45 ec              cmp    -0x14(%rbp),%eax
  4005d1: 7c c9                 jl     40059c <main+0x1c>
  4005d3: c7 45 fc 00 00 00 00  movl   $0x0,-0x4(%rbp)
  4005da: eb 23                 jmp    4005ff <main+0x7f>
  4005dc: 8b 45 fc              mov    -0x4(%rbp),%eax
  4005df: 48 98                 cltq  
  4005e1: 48 8d 14 c5 00 00 00  lea    0x0(,%rax,8),%rdx
  4005e8: 00
  4005e9: 48 8b 45 d8           mov    -0x28(%rbp),%rax
  4005ed: 48 01 d0              add    %rdx,%rax
  4005f0: 48 8b 00              mov    (%rax),%rax
  4005f3: 48 89 c7              mov    %rax,%rdi
  4005f6: e8 55 fe ff ff        callq  400450 <puts@plt>
  4005fb: 83 45 fc 01           addl   $0x1,-0x4(%rbp)
  4005ff: 8b 45 fc              mov    -0x4(%rbp),%eax
  400602: 48 98                 cltq  
  400604: 48 8d 14 c5 00 00 00  lea    0x0(,%rax,8),%rdx
  40060b: 00
  40060c: 48 8b 45 d8           mov    -0x28(%rbp),%rax
  400610: 48 01 d0              add    %rdx,%rax
  400613: 48 8b 00              mov    (%rax),%rax
  400616: 48 85 c0              test   %rax,%rax
  400619: 75 c1                 jne    4005dc <main+0x5c>
  40061b: b8 00 00 00 00        mov    $0x0,%eax
  400620: c9                    leaveq
  400621: c3                    retq  
  400622: 66 2e 0f 1f 84 00 00  nopw   %cs:0x0(%rax,%rax,1)
  400629: 00 00 00
  40062c: 0f 1f 40 00           nopl   0x0(%rax)
 
0000000000400630 <__libc_csu_init>:
  400630: 41 57                 push   %r15
  400632: 41 89 ff              mov    %edi,%r15d
  400635: 41 56                 push   %r14
  400637: 49 89 f6              mov    %rsi,%r14
  40063a: 41 55                 push   %r13
  40063c: 49 89 d5              mov    %rdx,%r13
  40063f: 41 54                 push   %r12
  400641: 4c 8d 25 c8 07 20 00  lea    0x2007c8(%rip),%r12        # 600e10 <__frame_dummy_init_array_entry>
  400648: 55                    push   %rbp
  400649: 48 8d 2d c8 07 20 00  lea    0x2007c8(%rip),%rbp        # 600e18 <__init_array_end>
  400650: 53                    push   %rbx
  400651: 4c 29 e5              sub    %r12,%rbp
  400654: 31 db                 xor    %ebx,%ebx
  400656: 48 c1 fd 03           sar    $0x3,%rbp
  40065a: 48 83 ec 08           sub    $0x8,%rsp
  40065e: e8 b5 fd ff ff        callq  400418 <_init>
  400663: 48 85 ed              test   %rbp,%rbp
  400666: 74 1e                 je     400686 <__libc_csu_init+0x56>
  400668: 0f 1f 84 00 00 00 00  nopl   0x0(%rax,%rax,1)
  40066f: 00
  400670: 4c 89 ea              mov    %r13,%rdx
  400673: 4c 89 f6              mov    %r14,%rsi
  400676: 44 89 ff              mov    %r15d,%edi
  400679: 41 ff 14 dc           callq  *(%r12,%rbx,8)
  40067d: 48 83 c3 01           add    $0x1,%rbx
  400681: 48 39 eb              cmp    %rbp,%rbx
  400684: 75 ea                 jne    400670 <__libc_csu_init+0x40>
  400686: 48 83 c4 08           add    $0x8,%rsp
  40068a: 5b                    pop    %rbx
  40068b: 5d                    pop    %rbp
  40068c: 41 5c                 pop    %r12
  40068e: 41 5d                 pop    %r13
  400690: 41 5e                 pop    %r14
  400692: 41 5f                 pop    %r15
  400694: c3                    retq  
  400695: 66 66 2e 0f 1f 84 00  data32 nopw %cs:0x0(%rax,%rax,1)
  40069c: 00 00 00 00
 
00000000004006a0 <__libc_csu_fini>:
  4006a0: f3 c3                 repz retq
  4006a2: 66 90                 xchg   %ax,%ax
 
Disassembly of section .fini:
 
00000000004006a4 <_fini>:
  4006a4: 48 83 ec 08           sub    $0x8,%rsp
  4006a8: 48 83 c4 08           add    $0x8,%rsp
  4006ac: c3                    retq  

The disassembled file is divided in various .sections as specified by the platform specific ABI. On a linux platform, for our argc program we have following .sections with executable instructions.

.init process initialization code.
.plt procedure linkage table.
.text program text, or executables instructions of program.
.fini finalization code of the process.

All the .section(s) of an ELF object can be listed by
objdump --section-headers argc

As can be seen in above disassembled executable .sections of argc executable object, the _starup code figures the argc and then pushes argc, argv, init, fini and rtld_fini, on the argument stack and calls __libc_start_main

__libc_start_main uses following arguments:
1. address of main function,
2. argc,
3. argv,
4. init,
5. fini,
6. rtld_fini, and
7. stack_end and is responsible to finally calling main() with appropriate arguments. There's a lot that goes in __libc_start_main, please read glibc's code for more details.

Monday, February 10, 2014

Java launcher debug

If you ever want to set your development environment so that java utilities should print some debug information you can use _JAVA_LAUNCHER_DEBUG. Note that this environment variable’s value is not relevant, as long as its set to something. So, if you want to disable debugging you have to unset it. Following is an example with javac and java

Compile a java program

Note the details below starting from the launcher state variables (e.g. full java version), arguments to java, the config values read from jvm.cfg, path of libjvm.so, and JavaJVM arguments. These are very handy when debugging a build.

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
$ (export _JAVA_LAUNCHER_DEBUG=1 \
&& javac TowerOfHanoi.java \
&& unset _JAVA_LAUNCHER_DEBUG)
----_JAVA_LAUNCHER_DEBUG----
Launcher state:
        debug:on
        javargs:on
        program name:java
        launcher name:java
        javaw:off
        fullversion:1.7.0_45-b18
        dotversion:1.7
        ergo_policy:NEVER_ACT_AS_A_SERVER_CLASS_MACHINE
Command line args:
argv[0] = javac
argv[1] = TowerOfHanoi.java
JRE path is /usr/java/jdk1.7.0_45/jre
jvm.cfg[0] = ->-server<-
jvm.cfg[1] = ->-client<-
jvm.cfg[2] = ->-hotspot<-
jvm.cfg[3] = ->-classic<-
jvm.cfg[4] = ->-native<-
jvm.cfg[5] = ->-green<-
1 micro seconds to parse jvm.cfg
Default VM: server
Does `/usr/java/jdk1.7.0_45/jre/lib/amd64/server/libjvm.so' exist ... yes.
mustsetenv: FALSE
JVM path is /usr/java/jdk1.7.0_45/jre/lib/amd64/server/libjvm.so
1 micro seconds to LoadJavaVM
JavaVM args:
    version 0x00010002, ignoreUnrecognized is JNI_FALSE, nOptions is 7
    option[ 0] = '-Dsun.java.launcher.diag=true'
    option[ 1] = '-Dapplication.home=/usr/java/jdk1.7.0_45'
    option[ 2] = '-Djava.class.path=/usr/java/jdk1.7.0_45/lib/tools.jar:/usr/java/jdk1.7.0_45/classes'
    option[ 3] = '-Xms8m'
    option[ 4] = '-Dsun.java.command=com.sun.tools.javac.Main TowerOfHanoi.java'
    option[ 5] = '-Dsun.java.launcher=SUN_STANDARD'
    option[ 6] = '-Dsun.java.launcher.pid=18510'
1 micro seconds to InitializeJVM
Main class is 'com.sun.tools.javac.Main'
App's argc is 1
    argv[ 0] = 'TowerOfHanoi.java'
1 micro seconds to load main class
----_JAVA_LAUNCHER_DEBUG----

Run the java program

Similary, we can see all the above when invoking a java program.

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
$ (export _JAVA_LAUNCHER_DEBUG=1 \
&& java TowerOfHanoi \
&& unset _JAVA_LAUNCHER_DEBUG)
----_JAVA_LAUNCHER_DEBUG----
Launcher state:
        debug:on
        javargs:off
        program name:java
        launcher name:java
        javaw:off
        fullversion:1.7.0_45-b18
        dotversion:1.7
        ergo_policy:DEFAULT_ERGONOMICS_POLICY
Command line args:
argv[0] = java
argv[1] = TowerOfHanoi
JRE path is /usr/java/jdk1.7.0_45/jre
jvm.cfg[0] = ->-server<-
jvm.cfg[1] = ->-client<-
jvm.cfg[2] = ->-hotspot<-
jvm.cfg[3] = ->-classic<-
jvm.cfg[4] = ->-native<-
jvm.cfg[5] = ->-green<-
1 micro seconds to parse jvm.cfg
Default VM: server
Does `/usr/java/jdk1.7.0_45/jre/lib/amd64/server/libjvm.so' exist ... yes.
mustsetenv: FALSE
JVM path is /usr/java/jdk1.7.0_45/jre/lib/amd64/server/libjvm.so
1 micro seconds to LoadJavaVM
JavaVM args:
    version 0x00010002, ignoreUnrecognized is JNI_FALSE, nOptions is 5
    option[ 0] = '-Dsun.java.launcher.diag=true'
    option[ 1] = '-Djava.class.path=.'
    option[ 2] = '-Dsun.java.command=TowerOfHanoi'
    option[ 3] = '-Dsun.java.launcher=SUN_STANDARD'
    option[ 4] = '-Dsun.java.launcher.pid=18692'
1 micro seconds to InitializeJVM
Main class is 'TowerOfHanoi'
App's argc is 0
1 micro seconds to load main class
----_JAVA_LAUNCHER_DEBUG----
##  Move disk 1 from A to B
##  Move disk 2 from A to C
##  Move disk 1 from B to C
##  Move disk 3 from A to B
##  Move disk 1 from C to A
##  Move disk 2 from C to B
##  Move disk 1 from A to B
##  Move disk 4 from A to C
##  Move disk 1 from B to C
##  Move disk 2 from B to A
##  Move disk 1 from C to A
##  Move disk 3 from B to C
##  Move disk 1 from A to B
##  Move disk 2 from A to C
##  Move disk 1 from B to C

Java program used above

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
public class TowerOfHanoi {
    public int move(int n, char from, char mid, char to) {
        if (n < 1) {
            return 0;
        }
 
        move(n-1, from, to, mid);
        System.out.println("##  Move disk " + n + " from " + from + " to " + to);
        move(n-1, mid, from, to);
 
        return 0;
    }
 
    public static void main(String argv[]) {
        TowerOfHanoi toh = new TowerOfHanoi();
        toh.move(4, 'A', 'B', 'C');
    }
};