Monday, February 18, 2008

Common Lisp Hyperspec Statistics

I have got tar.gz HyperSpec of common lisp. As stated
"The Common Lisp HyperSpec consumes just over 15MB of disk storage in about 2300 files. It contains approximately 105,000 hyperlinks!"

I am more interested in other statistics, that is statistics of what articles are most linked. I have to check this articles first; they seem to be basis for all other staff.

Link to received statistics file is at the end of the post.

Following is description of way of obtaining this statistics.

So first, gunzip and untar downloaded HyperSpec.tar.gz,
go to dir untared.

Check number of files, is there such amount as reported.


../HyperSpec$ ls
Body Data Front Graphics Issues
../Hyperspec$ lynx Body/[TAB]
Display all 1502 possibilities? (y or n)(n)
$ find ./ -type f | wc
2342 2342 45568
$

Than, prepare full document body in plain text:

$ find ./ -type f | xargs lynx --dump > temp.txt

Check number of links:

$ cat temp.txt | grep "file:" | wc
130033 260085 12115448


prepare csv with statistics:


$ cat temp.txt | grep "file:" |
perl -e 'while (<>)
{ @a=split(/file\:/);
$hash{$a[1]}++;} ;
for my $key (keys %hash)
{ print "$hash{$key};$key"; }
' > temp.csv


Now get top 20 linked articles:
$ cat temp.csv | sort -n | tail -20
850;//localhost/home/rtg/stored/programming/lisp%26emacs/HyperSpec/Body/26_glo_s.htm#symbol
1035;//localhost/home/rtg/stored/programming/lisp%26emacs/HyperSpec/Body/26_glo_o.htm#object
1156;//localhost/home/rtg/stored/programming/lisp%26emacs/HyperSpec/Body/26_glo_t.htm#type
1240;//localhost/home/rtg/stored/programming/lisp%26emacs/HyperSpec/Body/a_nil.htm#nil
1360;//localhost/home/rtg/stored/programming/lisp%26emacs/HyperSpec/Body/t
1372;//localhost/home/rtg/stored/programming/lisp%26emacs/HyperSpec/Body/s
2122;//localhost/home/rtg/stored/programming/lisp%26emacs/HyperSpec/Body/m
2190;//localhost/home/rtg/stored/programming/lisp%26emacs/HyperSpec/Front/StartPts.htm
2190;//localhost/home/rtg/stored/programming/lisp%26emacs/HyperSpec/Front/X_Master.htm
2190;//localhost/home/rtg/stored/programming/lisp%26emacs/HyperSpec/Front/X_Symbol.htm
2227;//localhost/home/rtg/stored/programming/lisp%26emacs/HyperSpec/Front/Help.htm#Disclaimer
2245;//localhost/home/rtg/stored/programming/lisp%26emacs/HyperSpec/Body/26_a.htm
2248;//localhost/home/rtg/stored/programming/lisp%26emacs/HyperSpec/Front/Contents.htm
2865;//localhost/home/rtg/stored/programming/lisp%26emacs/HyperSpec/Body/a
3216;//localhost/home/rtg/stored/programming/lisp%26emacs/HyperSpec/Front/X3J13Iss.htm
4417;//localhost/home/rtg/stored/programming/lisp%26emacs/HyperSpec/Front/Help.htm#Legal
4530;//localhost/home/rtg/stored/programming/lisp%26emacs/HyperSpec/Front/index.htm
6455;//localhost/home/rtg/stored/programming/lisp%26emacs/HyperSpec/Body/f
7121;//localhost/home/rtg/stored/programming/lisp%26emacs/HyperSpec/Issues
7679;//localhost/home/rtg/stored/programming/lisp%26emacs/HyperSpec/Front/


Complete .csv with statistics placed there.

Now I have to place this file to my PDA near the Hyperspec, to use it as plan for reading in metro . TG, Pocket Excel recognizes .csv;)

No comments: