Sunday, March 16, 2008

Clones in code finding algorithm:

Searching a way to identify clones in source code, (old copy-pasted and a little bit modified staff)

I have invented following schema's to identify clones:

1) index source, with some window of tokens (index sequences of tokens instead of single tokens).
2) analyse similar sequences, expand to biggest clone.

Tried id-utils for indexing.
It seems, that it is no developer currently of the id-Utils in FSF.

Also mkid segfaults, when tried to look into the directories with no rights.
However it gives good idea on how index could look like.

I starting writing this tool, using gnu-coreutils as reference for style.

No comments: