Quantcast

database issues sourceref references

classic Classic list List threaded Threaded
2 messages Options
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

database issues sourceref references

bm-7
About the database, I still have one issue. Perhaps it is best to decide how
this issue is resolved before to much coding is done.

I started thinking about the preson-source link and the fact that apparently
many people link the same source multiple times to a person because the person
appears on different pages in the source and they want to have these relevant
notes separated.

I do not know if this was the intention of the designers, but the design of
gramps allows multiple instances of the same source so people use it. I
see one
can also attach the same photo multiple times to a person. It is my opinion
that for every relation possible the programmer should strickly decide what is
meant, and forbid duplicates if this is silly, like the same pictures multiple
times. It reduces complexity later on should the picture be deleted. So
I think
it should be decided what may be duplicated and what not, and what do we mean
with duplicates (important if things change in the future or if reports are
made to know how to handle the duplicates)

One can argument that duplicate sources is not the best way. Just like
repositories are created to bundle sources, it would be nice if on the source
tab of a person you see a source only onces, and then you have multiple
references/pages within that source possible. It would make retrieval later to
edit a part easier (if you have 10 times the same source and you found out who
the other people on a photo are in this source, which one of the 10 sourcerefs
is the page where the photo is on? You start opening one after the other).

Personally I have another practical problem which is in the same line:
One of my
sources is "Grandma's collection of death letters" (I don't think you call it
death letters in english, but the hollywood movies I saw never use this
word so
I don't know it). The collection is partly scanned in (the old ones and the
damaged ones especially). Some information of a person comes from their death
letter, but for others it comes of their wifes/child letter. I think you see
where I'm heading. I have a person of which I know information due to the
spouses death lettre. I go to source and add "Grandma's collection of death
letters", and in the text I put the name of the death letter used.
This does not allow easy retrieval of how I got the information, as the
collection is large and has many media objects. It would be nice if I can just
add the scanned picture of the wife's letter as a media object to the specific
sourcereference of this person.

I suppose with the new repository object similar questions might arise.

If this is something that might be added in the future to gramps, the database
changes done now should make it easy to add this feature later, and not hinder
it.

So general question: should this be designed in, or is this too far fetched?
Does anyone see a reason one would actually want two separate person-source
links?

Now the technical part on what implementing/allowing this has on the proposed
design:
 From what richard says, he goes with a design that has a unique key for a
sourceref in person, and I think it's good. As present gramps allows for the
same source to be present multiple times so this design does mean some extra
list checking is needed: If a sourceref is deleted, the reference map may only
be deleted if all sourcerefs to the same source are deleted.
The design suggested is:

> OK, at this level I think better in code so I have implemented what I
> think is a reasonable solution. Here goes an explanation:
>
> A single reference_map table with three keys:
>
> main key: is a tuple of
> (primary_object_handle,referenced_object_handle) and is guaranteed to
> be unique.
> secondary_key1: the primary_object_handle allows duplicates and uses BTree
> secondary_key2: the referenced_object_handle allows duplicates and
> uses BTree
>
> data: a tuple of the form: ((primary_object_class_name,
> primary_object_handle),
>                  (referenced_object_class_name,
> referenced_object_handle))
>
> The main key can be used to quickly check for the existence of a
> particular primary_object/referenced_object pair.
> The secondary_key1 can be used to lookup for deletion.
> The secondary_key2 can be used to lookup for search.

So the table is stored by a hash, which I also think is best. The
design however
does not allow sourceref to have themselves mediarefs (well it does you can do
anything in BSDDB remember, but the way to do it is counternatural I mean).
The following modification would allow this. I say it with an example:
We have source1, media1, media2, person1 . source1 has the 2 media
connected to
them, so there are 2 mediarefs: source1media1 and source1media2. Person1 has
source1 connected, so a sourceref person1source1. We want to allow in the
future that the sourceref can have a media coupled to it, so eg media2 is the
page of the source where person1 is mentioned: so we need a mediareference in
the sourcereference: person1source1_media2
The new table referencemap is created to do access backwards. So it contains

key1, (key person1, key source1)
key2, (key source1, key media1)
key3, (key source1, key media2)

A search on where is source1 used with this table (the backward search) will
give you immediately as result person1.

How do we make it easy to later also allow media objects connected to
sourcerefs. We need to add the following to the table:
key4, (key1 = key of sourceref person1 to source1, key media2)

A search on where is media2 used with this table will give you immediately as
result source1 and key1.

What does the above example imply: I would use as key1 not a tuple of
(primary_object_handle,referenced_object_handle) but really a unique key just
as the other gramps key. For ease this key can be kept in the sourceref. I
would do this but note that it is not needed. By combining the two secondary
indices (possible in BDSDB) one can quickly find this key given only
the person
key and the source key.

Many things to think over, and the above can be implemented in many
ways. Should
the sourceref data be moved to this reference map, it would simplify the above
construction, but I said that before no ;-)



----------------------------------------------------------------
This message was sent using IMP, the Internet Messaging Program.



-------------------------------------------------------
This SF.net email is sponsored by: Splunk Inc. Do you grep through log files
for problems?  Stop!  Download the new AJAX search engine that makes
searching your log files as easy as surfing the  web.  DOWNLOAD SPLUNK!
http://ads.osdn.com/?ad_id=7637&alloc_id=16865&op=click
_______________________________________________
Gramps-devel mailing list
[hidden email]
https://lists.sourceforge.net/lists/listinfo/gramps-devel
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

Re: database issues sourceref references

Don Allingham
On Fri, 2005-12-16 at 10:31 +0100, [hidden email] wrote:

> I started thinking about the preson-source link and the fact that apparently
> many people link the same source multiple times to a person because the person
> appears on different pages in the source and they want to have these relevant
> notes separated.
>
> I do not know if this was the intention of the designers, but the design of
> gramps allows multiple instances of the same source so people use it. I
> see one
> can also attach the same photo multiple times to a person. It is my opinion
> that for every relation possible the programmer should strickly decide what is
> meant, and forbid duplicates if this is silly, like the same pictures multiple
> times. It reduces complexity later on should the picture be deleted. So
> I think
> it should be decided what may be duplicated and what not, and what do we mean
> with duplicates (important if things change in the future or if reports are
> made to know how to handle the duplicates)
This is the desired intention. Let me give you an example:

We have a common source of information, for this example, it is a book
called "All about John Smith". We get a lot of information about "John
Smith" from this book. For example, it has information recording the
person's full name on page 100. It also indicates on page 300 that this
same "John Smith" sometimes spelled his name as "Jon Smith".

In this case, the person's primary name would be "John Smith", and we
would attach a source reference to this source (the book) to this name.
In this source reference, we would indicate in the source reference that
we found the information on 100. We would also add the name of "Jon
Smith", create a link to the same source (after all, it is the same
book), and indicate on this source reference that we found the
information on page 300.

While the sourceref point to the same source, they are not identical.

So, kind of in a nutshell, multiple source references in an object
referring to the same source is something that we need to support, and
is something that many users are using right now.

> If this is something that might be added in the future to gramps, the database
> changes done now should make it easy to add this feature later, and not hinder
> it.

As someone who has worked in industry for over 20 years now, I have
found that while this may sound good in theory, many times this is not a
good idea in practice.

In projects that I have worked on (both software and hardware), the
subject comes up that "we may need to do this in the future". In almost
every case where we have "planned" features for the future, we ended up
getting a large, complex, and bloated infrastructure that everyone had
to deal with. And 9 times out of 10, the thing what we may have wanted
to do for the future never came up, and when it did, we found that the
implementation wasn't what we originally thought it might need to be. In
fact, in many cases these "prepare for the future" enhancements
prevented us from doing what we really need to do in the future.
 

--
Don Allingham <[hidden email]>

signature.asc (196 bytes) Download Attachment
Loading...