Actually, the task below can be easily solved by the paper and pen.
It was written in home at evening, just to measure performance as additional argument for rewriting stupid code for improving maintainability and performance of old legacy code.
Hypothetical example of the optimization of the following code was considered, and the return of the optimization effort (as resulting 'optimized' code) was timed.
for(unsigned int i=0;i<65536;i++)
for(unsigned int j=0;j<65536;j++)
switch(j%15+1)
{
case 1:
f1();
break;
case 2:
f2();
...
}
The following are examples of performance, that can be achieved for integer massive operations due to different types of optimization.
Tried on AMD dual core DeskTop centOS, with gcc 4.1.3,
with optimization, and without.
4 types of code were considered:
Using switch(code) and case to define function to call;
Using case(code) and inlining function in call;
Using call to function by code ( fun(code) );
Just doing job without obfuscation by ftabs and cases.
Compilation done in following way:
[rtg@rtgCent cpp_exa]$ g++ -O3 case.cpp -o caseo
[rtg@rtgCent cpp_exa]$ g++ -O3 ftab.cpp -o ftabo
[rtg@rtgCent cpp_exa]$ g++ ftab.cpp -o ftab
[rtg@rtgCent cpp_exa]$ g++ case.cpp -o case
[rtg@rtgCent cpp_exa]$ su -
Some examples of output:
Red and green just presents timing for worst and best timing.
[root@rtgCent cpp_exa]# time nice -n -10 ./case
-458752
real 2m23.422s
user 2m23.205s
sys 0m0.112s
[root@rtgCent cpp_exa]# nice -n -10 ./caseo
-458752
[root@rtgCent cpp_exa]# time nice -n -10 ./caseo
-458752
real 1m3.688s
user 1m3.614s
sys 0m0.054s
[root@rtgCent cpp_exa]# time nice -n -10 ./ftab
-458752
real 1m41.664s
user 1m41.521s
sys 0m0.101s
[root@rtgCent cpp_exa]# time nice -n -10 ./ftabo
-458752
real 0m56.068s
user 0m56.004s
sys 0m0.047s
[root@rtgCent cpp_exa]# time nice -n -10 ./plaino
-458752
real 1m3.801s
user 1m3.732s
sys 0m0.047s
[root@rtgCent cpp_exa]# time nice -n -10 ./plain
-458752
real 1m53.174s
user 1m53.060s
sys 0m0.079s
[root@rtgCent cpp_exa]# time nice -n -10 ./opto
-458752
real 0m14.527s
user 0m14.509s
sys 0m0.014s
[root@rtgCent cpp_exa]# time nice -n -10 ./opt
-458752
real 0m31.956s
user 0m31.923s
sys 0m0.022s
It were just examples of the simpliest 'non algorithmic' optimization.
Simplified 'Algorithmic' optimization of the simplest case gives more than 100 time's faster output:
[root@rtgCent cpp_exa]# time nice -n -10 ./alg
-458752
real 0m0.011s
user 0m0.004s
sys 0m0.005s
[root@rtgCent cpp_exa]# time nice -n -10 ./algo
-458752
real 0m0.011s
user 0m0.003s
sys 0m0.005s
If you have some different opinion/results, please report in comments :)
By the way, who needs optimization today of critical systems, with today's pricing on retesting of system, and today modern cheap fast processors?
Code for examples:
alg.cpp
ftab.cpp
case.cpp
plain.cpp
opt.cpp
cpuinfo:
cpuinfo
1 comment:
Post a Comment