how are the file names determined of the html export filter
how are the file names determined of the html export filter, the one that
results in "My Family Tree" (Narrative website). I get e.g a page name of
the form: gramps/ppl/1/A/1AJ4S766A35YICG7Z1.html
Would it be possible for a 3rd party program perl script e.g. to determine
this name from the gedcom ID I0019....? Why not give the person page name,
the gedcom id tag?
Without a home the journey is endless
Re: how are the file names determined of the html export filter
On Sun, 2006-01-15 at 23:44 +0100, Richard Bos wrote:
> how are the file names determined of the html export filter, the one that
> results in "My Family Tree" (Narrative website). I get e.g a page name of
> the form: gramps/ppl/1/A/1AJ4S766A35YICG7Z1.html
> Would it be possible for a 3rd party program perl script e.g. to determine
> this name from the gedcom ID I0019....? Why not give the person page name,
> the gedcom id tag?
Believe it or not, there is a very good rationale for this.
The name of the file, in this case 1AJ4S766A35YICG7Z1.html corresponds
to the internal database key. This key is unique, and does not change.
No matter what you change on a person, including the ID value, the same
person will *always* be generated with the same file name. So, in a way,
this serves as a permanent link.
An equally, if not more important reason, has to do with maintaining
your server's performance. Those of you who have worked with servers,
know that file system performance can significantly degrade with the
number of files in a directory. If you want to test this, find a
directory with 10K files in it, and see how long it takes to get a
directory listing. The standard Linux file system, ext3, seems to handle
up to 256 files in a directory before it starts to degrade.
Going with an evenly distributed naming scheme allows us to equally
distribute the files among subdirectories, thereby decreasing access
times to the files. The database key that we use is nicely distributed
across the name space. This is also why the first two letters are used
for subdirectories (in this case ppp/1/A). You can have tens of
thousands of people generated in this way without affecting your web
server performance, since the equally distributed names prevent too many
files from occurring in a single directory.
Alex and I actually did a significant amount of analysis on this
Using the ID values would lose these benefits. With naming structures
like I0001, I0002, etc, you get a very poor distribution of names, and
you have no protection in case someone changes the ID value on you.
However, it would be a simple plugin to map ID values to handles if you
really needed such a map. A user could probably generate this in a
matter of a few minutes.
PS. - I know that someone is going to say, "But you can just limit the
number of people per directory, and add directories as needed, just keep
track of which files are in which directory." This is a partially true
statement, but depending on the selected ID values and selected people,
a generated page can end up in different directories from run to run.