Quantcast

Storing data from large sources

classic Classic list List threaded Threaded
25 messages Options
12
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

Storing data from large sources

Gerald Britton-2
I want to open up a discussion about how best to store data from large
sources.  By "large" I mean sources such as registers, logs, family
bibles, censuses, member lists and other things that contain many
entries (millions, in the case of a census).  Usually, each entry in
such a document has several columns of data.  For example, a marriage
register at a church will have names of the bride, groom, witnesses
and possibly parents, date, officiating minister's name and other
things.  A death register may include cause of death, place and other
things.  A census may have all sorts of data, including whether the
house was brick or frame and how many stories it had, if the person
was an employer (and if so, how many employees he had) or employee,
whether he was deaf, dumb, blind, crazy (or "lunatic") and how many
sheep he has.

One challenge with this sort of data is the document in which it is
found.  If we treat it as a source (which seems natural to me) then we
have a problem storing all the bits of data we find for one
individual.  You can't use source attributes, since those are shared
in gramps.  We could treat the document as a repository and the
entries as sources from that repository, but that seems unnatural when
you are looking at a book or a bible or a microfilm.  Also, since a
repository can't have a repository (i.e. you can't nest them) and the
real repository (building, web site, etc.) may house many such
sources, how can you bring all these pseudo-repositories together
under that real repository?

Another challenge is all the bits of data we find in this document.
Some should no doubt find there way to other objects: the cause of
death to a death event attribute; the witness names to a marriage
event attribute; the house construction and size to a residence
attribute, perhaps.  However, it is still good (very good, I feel) to
also keep all this data together and tie it to the source in which it
is found.

The Census gramplet addresses this by using event reference
attributes.  So the fact that the person was a lunatic is recorded in
an attribute of the event reference -- Lunatic: yes -- and similarly,
the other attributes.  This solves the immediate problem for censuses
but may not be generally extensible to other documents -- especially
if there are multiple documents for some event that disagree with each
other.  Furthermore, it is not clear to me that this is the best way
to handle this data since the data is not really an attribute of the
event but of the source document since the data was recorded in the
source document at the time of the event.

I'm now wondering if we should add attributes to source references
analogous to event references.  If available, this would be a natural
place to store all the bits of data for each entry while keeping one
source object (the book, film, etc.) at the repositories where it can
be found. On the other hand, introducing source reference attributes
may introduce challenges for GEDCOM exports and imports.

So let's discuss!  What creative ways can we devise to handle these
sorts of source documents?  Should we extend our data model and, if
so, how?  If we extend the data model, what are the repercussions?

--
Gerald Britton

------------------------------------------------------------------------------
Increase Visibility of Your 3D Game App & Earn a Chance To Win $500!
Tap into the largest installed PC base & get more eyes on your game by
optimizing for Intel(R) Graphics Technology. Get started today with the
Intel(R) Software Partner Program. Five $500 cash prizes are up for grabs.
http://p.sf.net/sfu/intelisp-dev2dev
_______________________________________________
Gramps-devel mailing list
[hidden email]
https://lists.sourceforge.net/lists/listinfo/gramps-devel
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

Re: Storing data from large sources

Benny Malengier


2010/11/29 Gerald Britton <[hidden email]>
I want to open up a discussion about how best to store data from large
sources.  By "large" I mean sources such as registers, logs, family
bibles, censuses, member lists and other things that contain many
entries (millions, in the case of a census).  Usually, each entry in
such a document has several columns of data.  For example, a marriage
register at a church will have names of the bride, groom, witnesses
and possibly parents, date, officiating minister's name and other
things.  A death register may include cause of death, place and other
things.  A census may have all sorts of data, including whether the
house was brick or frame and how many stories it had, if the person
was an employer (and if so, how many employees he had) or employee,
whether he was deaf, dumb, blind, crazy (or "lunatic") and how many
sheep he has.

One challenge with this sort of data is the document in which it is
found.  If we treat it as a source (which seems natural to me) then we
have a problem storing all the bits of data we find for one
individual.  You can't use source attributes, since those are shared
in gramps.  We could treat the document as a repository and the
entries as sources from that repository, but that seems unnatural when
you are looking at a book or a bible or a microfilm.  Also, since a
repository can't have a repository (i.e. you can't nest them) and the
real repository (building, web site, etc.) may house many such
sources, how can you bring all these pseudo-repositories together
under that real repository?

Another challenge is all the bits of data we find in this document.
Some should no doubt find there way to other objects: the cause of
death to a death event attribute; the witness names to a marriage
event attribute; the house construction and size to a residence
attribute, perhaps.  However, it is still good (very good, I feel) to
also keep all this data together and tie it to the source in which it
is found.

The Census gramplet addresses this by using event reference
attributes.  So the fact that the person was a lunatic is recorded in
an attribute of the event reference -- Lunatic: yes -- and similarly,
the other attributes.  This solves the immediate problem for censuses
but may not be generally extensible to other documents -- especially
if there are multiple documents for some event that disagree with each
other.  Furthermore, it is not clear to me that this is the best way
to handle this data since the data is not really an attribute of the
event but of the source document since the data was recorded in the
source document at the time of the event.

I'm now wondering if we should add attributes to source references
analogous to event references.  If available, this would be a natural
place to store all the bits of data for each entry while keeping one
source object (the book, film, etc.) at the repositories where it can
be found. On the other hand, introducing source reference attributes
may introduce challenges for GEDCOM exports and imports.

So let's discuss!  What creative ways can we devise to handle these
sorts of source documents?  Should we extend our data model and, if
so, how?  If we extend the data model, what are the repercussions?


For me,

1. Repository is where you find a source. We should not misuse it

2. Source is the the book/registery, or a part of it. The source holds information, and literal transcripts of a source should hence be stored in this object.
A source does _not_ have what we call attributes, a source has "Data". This is not exported to GEDCOM. The Data is not shared, the source is what is shared.

3. An event is something happening to a person/family at a certain time/place. Census event is the census taker that passes and writes info in the census source.

4. You learn from a source information about a person or family, so you want to add information about the person/family in the person/family object. You add this information, eg an attribute: Description, Blue eyes.  Source of this attribute is the census souce.

I don't see problems here, except for the fact that you can only store the census data in the source as a note if you want it stored. So there is no 'database scheme' for it. You can use Source Data for key-value pairs.

Now, the other way around. You have a person, and you see a source saying green eyes. You go to attributes and you see blue eyes. You wonder if there is no error. You click on the attribute to from what source you have this information, you open the census source, and you look at the data inside of it. If you used a note for the data in the census, you can share it in the source reference, and you know what the census said. If you are uncertain and you want to recheck the census, you go to the repository tab and you see where this census is stored to check in the repository the source again.

So, In all this, you normally _don't_ check the census event! It seems stupid to me to store data obtained in the census taking there. At most, I would share a note with the transcript there.

So, in my view, the way census gramplet works is wrong.

Benny

--
Gerald Britton

------------------------------------------------------------------------------
Increase Visibility of Your 3D Game App & Earn a Chance To Win $500!
Tap into the largest installed PC base & get more eyes on your game by
optimizing for Intel(R) Graphics Technology. Get started today with the
Intel(R) Software Partner Program. Five $500 cash prizes are up for grabs.
http://p.sf.net/sfu/intelisp-dev2dev
_______________________________________________
Gramps-devel mailing list
[hidden email]
https://lists.sourceforge.net/lists/listinfo/gramps-devel


------------------------------------------------------------------------------
Increase Visibility of Your 3D Game App & Earn a Chance To Win $500!
Tap into the largest installed PC base & get more eyes on your game by
optimizing for Intel(R) Graphics Technology. Get started today with the
Intel(R) Software Partner Program. Five $500 cash prizes are up for grabs.
http://p.sf.net/sfu/intelisp-dev2dev
_______________________________________________
Gramps-devel mailing list
[hidden email]
https://lists.sourceforge.net/lists/listinfo/gramps-devel
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

Re: Storing data from large sources

Nick Hall-6
In reply to this post by Gerald Britton-2
Gerald,

I have some comments regarding census data and the census add-ons.


Gerald Britton wrote:

> I want to open up a discussion about how best to store data from large
> sources.  By "large" I mean sources such as registers, logs, family
> bibles, censuses, member lists and other things that contain many
> entries (millions, in the case of a census).  Usually, each entry in
> such a document has several columns of data.  For example, a marriage
> register at a church will have names of the bride, groom, witnesses
> and possibly parents, date, officiating minister's name and other
> things.  A death register may include cause of death, place and other
> things.  A census may have all sorts of data, including whether the
> house was brick or frame and how many stories it had, if the person
> was an employer (and if so, how many employees he had) or employee,
> whether he was deaf, dumb, blind, crazy (or "lunatic") and how many
> sheep he has.
>
> One challenge with this sort of data is the document in which it is
> found.  If we treat it as a source (which seems natural to me) then we
> have a problem storing all the bits of data we find for one
> individual.  

A census is clearly an event.


> You can't use source attributes, since those are shared
> in gramps.  We could treat the document as a repository and the
> entries as sources from that repository, but that seems unnatural when
> you are looking at a book or a bible or a microfilm.  Also, since a
> repository can't have a repository (i.e. you can't nest them) and the
> real repository (building, web site, etc.) may house many such
> sources, how can you bring all these pseudo-repositories together
> under that real repository?
>
> Another challenge is all the bits of data we find in this document.
> Some should no doubt find there way to other objects: the cause of
> death to a death event attribute; the witness names to a marriage
> event attribute; the house construction and size to a residence
> attribute, perhaps.  However, it is still good (very good, I feel) to
> also keep all this data together and tie it to the source in which it
> is found.
>
> The Census gramplet addresses this by using event reference
> attributes.  So the fact that the person was a lunatic is recorded in
> an attribute of the event reference -- Lunatic: yes -- and similarly,
> the other attributes.  This solves the immediate problem for censuses
> but may not be generally extensible to other documents -- especially
> if there are multiple documents for some event that disagree with each
> other.  

Yes, a census is a case where I can't see a census event ever having
more than one source. In fact, if the census gramplet is used then only
one census source is allowed.


> Furthermore, it is not clear to me that this is the best way
> to handle this data since the data is not really an attribute of the
> event but of the source document since the data was recorded in the
> source document at the time of the event.
>  

I see something like "place of birth" to be an attribute of the person
and census event. An attribute in the event reference object is a
natural and convenient place to store this data.

I see something like "number of rooms in the house" to be an attribute
of the census event only.


> I'm now wondering if we should add attributes to source references
> analogous to event references.  If available, this would be a natural
> place to store all the bits of data for each entry while keeping one
> source object (the book, film, etc.) at the repositories where it can
> be found. On the other hand, introducing source reference attributes
> may introduce challenges for GEDCOM exports and imports.
>  

 From a census point of view, I don't see a requirement for attributes
on a source reference object.

I have had a request to generate a transcript from the census data and
record it as a Note in the source reference. This is a good idea, but I
have not yet implemented it.

I can also see the case for storing an image of the census page in the
source reference. Unfortunately this is not possible so I have suggested
we store it on the census event. Also not yet implemented.


> So let's discuss!  What creative ways can we devise to handle these
> sorts of source documents?  Should we extend our data model and, if
> so, how?  If we extend the data model, what are the repercussions?
>
>  
Regards,


Nick.


------------------------------------------------------------------------------
Increase Visibility of Your 3D Game App & Earn a Chance To Win $500!
Tap into the largest installed PC base & get more eyes on your game by
optimizing for Intel(R) Graphics Technology. Get started today with the
Intel(R) Software Partner Program. Five $500 cash prizes are up for grabs.
http://p.sf.net/sfu/intelisp-dev2dev
_______________________________________________
Gramps-devel mailing list
[hidden email]
https://lists.sourceforge.net/lists/listinfo/gramps-devel
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

Re: Storing data from large sources

Nick Hall-6
In reply to this post by Benny Malengier


Benny Malengier wrote:

>
>
> 2010/11/29 Gerald Britton <[hidden email]
> <mailto:[hidden email]>>
>
>     I want to open up a discussion about how best to store data from large
>     sources.  By "large" I mean sources such as registers, logs, family
>     bibles, censuses, member lists and other things that contain many
>     entries (millions, in the case of a census).  Usually, each entry in
>     such a document has several columns of data.  For example, a marriage
>     register at a church will have names of the bride, groom, witnesses
>     and possibly parents, date, officiating minister's name and other
>     things.  A death register may include cause of death, place and other
>     things.  A census may have all sorts of data, including whether the
>     house was brick or frame and how many stories it had, if the person
>     was an employer (and if so, how many employees he had) or employee,
>     whether he was deaf, dumb, blind, crazy (or "lunatic") and how many
>     sheep he has.
>
>     One challenge with this sort of data is the document in which it is
>     found.  If we treat it as a source (which seems natural to me) then we
>     have a problem storing all the bits of data we find for one
>     individual.  You can't use source attributes, since those are shared
>     in gramps.  We could treat the document as a repository and the
>     entries as sources from that repository, but that seems unnatural when
>     you are looking at a book or a bible or a microfilm.  Also, since a
>     repository can't have a repository (i.e. you can't nest them) and the
>     real repository (building, web site, etc.) may house many such
>     sources, how can you bring all these pseudo-repositories together
>     under that real repository?
>
>     Another challenge is all the bits of data we find in this document.
>     Some should no doubt find there way to other objects: the cause of
>     death to a death event attribute; the witness names to a marriage
>     event attribute; the house construction and size to a residence
>     attribute, perhaps.  However, it is still good (very good, I feel) to
>     also keep all this data together and tie it to the source in which it
>     is found.
>
>     The Census gramplet addresses this by using event reference
>     attributes.  So the fact that the person was a lunatic is recorded in
>     an attribute of the event reference -- Lunatic: yes -- and similarly,
>     the other attributes.  This solves the immediate problem for censuses
>     but may not be generally extensible to other documents -- especially
>     if there are multiple documents for some event that disagree with each
>     other.  Furthermore, it is not clear to me that this is the best way
>     to handle this data since the data is not really an attribute of the
>     event but of the source document since the data was recorded in the
>     source document at the time of the event.
>
>     I'm now wondering if we should add attributes to source references
>     analogous to event references.  If available, this would be a natural
>     place to store all the bits of data for each entry while keeping one
>     source object (the book, film, etc.) at the repositories where it can
>     be found. On the other hand, introducing source reference attributes
>     may introduce challenges for GEDCOM exports and imports.
>
>     So let's discuss!  What creative ways can we devise to handle these
>     sorts of source documents?  Should we extend our data model and, if
>     so, how?  If we extend the data model, what are the repercussions?
>
>
> For me,
>
> 1. Repository is where you find a source. We should not misuse it

I agree.  The repository for a census might be "National Archives".


>
> 2. Source is the the book/registery, or a part of it. The source holds
> information, and literal transcripts of a source should hence be
> stored in this object.

Yes.  The source for a census might be "1851 England Census".

You could store literal transcripts here, but you would have a large
number of them.  Wouldn't it be better to store them in a Source
Reference where you would only have the transcript of a page?


> A source does _not_ have what we call attributes, a source has "Data".
> This is not exported to GEDCOM. The Data is not shared, the source is
> what is shared.
>
> 3. An event is something happening to a person/family at a certain
> time/place. Census event is the census taker that passes and writes
> info in the census source.

Yes.  A census event will in general have several people attached to
it.  It will also have a census source with the source reference
containing a full reference to its page, and possibly a transcript.  (On
my ToDo list).


>
> 4. You learn from a source information about a person or family, so
> you want to add information about the person/family in the
> person/family object. You add this information, eg an attribute:
> Description, Blue eyes.  Source of this attribute is the census souce.

OK, this is where we have a problem.   One of the reasons that I wrote
the census add-ons is that it is common to get contradictory
information.  You want to record all this information against a
Person/Census combination.   The natural place to store this is as an
attribute on the event reference object.


>
> I don't see problems here, except for the fact that you can only store
> the census data in the source as a note if you want it stored. So
> there is no 'database scheme' for it. You can use Source Data for
> key-value pairs.

I don't like the idea of using Source Data.  Storing a transcript
against a source and/or source reference as a shared Note is a good idea.


>
> Now, the other way around. You have a person, and you see a source
> saying green eyes. You go to attributes and you see blue eyes. You
> wonder if there is no error.

Good example.  You might have added from a census or from another source.


> You click on the attribute to from what source you have this
> information, you open the census source, and you look at the data
> inside of it. If you used a note for the data in the census, you can
> share it in the source reference, and you know what the census said.
> If you are uncertain and you want to recheck the census, you go to the
> repository tab and you see where this census is stored to check in the
> repository the source again.

Well at this point you probably want to stop editing and run some
reports to examine your data.  The Census report is written just for
this purpose - it allows you to compare all census data for a person in
a structured way.  Once you have evaluated your data you can either go
back and edit the record.


>
> So, In all this, you normally _don't_ check the census event!

It's not really a matter of checking an event.  We want the data stored
in a structured manner so that we can run reports to analyse the data.


> It seems stupid to me to store data obtained in the census taking there.

I was suggesting storing data such as "number of rooms" as attributes of
a census event.  Again, this is a natural place to store the data and
allows convenient access for the Census report and Census editor.


> At most, I would share a note with the transcript there.

I would prefer for transcripts to be stored on the Source Reference
rather than Event.  I only suggested storing an image on the Event
because it is not possible to store it on the Source Reference.


>
> So, in my view, the way census gramplet works is wrong.

Well I see it as, transcripts and images on the Source Reference or
Source, maybe shared.  Data extracted from this source data on People,
Families, Events.


Nick.


>
> Benny
>
> --
>
>     Gerald Britton
>
>     ------------------------------------------------------------------------------
>     Increase Visibility of Your 3D Game App & Earn a Chance To Win $500!
>     Tap into the largest installed PC base & get more eyes on your game by
>     optimizing for Intel(R) Graphics Technology. Get started today
>     with the
>     Intel(R) Software Partner Program. Five $500 cash prizes are up
>     for grabs.
>     http://p.sf.net/sfu/intelisp-dev2dev
>     _______________________________________________
>     Gramps-devel mailing list
>     [hidden email]
>     <mailto:[hidden email]>
>     https://lists.sourceforge.net/lists/listinfo/gramps-devel
>
>
> ------------------------------------------------------------------------
>
> ------------------------------------------------------------------------------
> Increase Visibility of Your 3D Game App & Earn a Chance To Win $500!
> Tap into the largest installed PC base & get more eyes on your game by
> optimizing for Intel(R) Graphics Technology. Get started today with the
> Intel(R) Software Partner Program. Five $500 cash prizes are up for grabs.
> http://p.sf.net/sfu/intelisp-dev2dev
> ------------------------------------------------------------------------
>
> _______________________________________________
> Gramps-devel mailing list
> [hidden email]
> https://lists.sourceforge.net/lists/listinfo/gramps-devel
>  

------------------------------------------------------------------------------
Increase Visibility of Your 3D Game App & Earn a Chance To Win $500!
Tap into the largest installed PC base & get more eyes on your game by
optimizing for Intel(R) Graphics Technology. Get started today with the
Intel(R) Software Partner Program. Five $500 cash prizes are up for grabs.
http://p.sf.net/sfu/intelisp-dev2dev
_______________________________________________
Gramps-devel mailing list
[hidden email]
https://lists.sourceforge.net/lists/listinfo/gramps-devel
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

Re: Storing data from large sources

Doug
In reply to this post by Gerald Britton-2
On 29/11/10 17:08, Gerald Britton wrote:

> I want to open up a discussion about how best to store data from large
> sources.  By "large" I mean sources such as registers, logs, family
> bibles, censuses, member lists and other things that contain many
> entries (millions, in the case of a census).  Usually, each entry in
> such a document has several columns of data.  For example, a marriage
> register at a church will have names of the bride, groom, witnesses
> and possibly parents, date, officiating minister's name and other
> things.  A death register may include cause of death, place and other
> things.  A census may have all sorts of data, including whether the
> house was brick or frame and how many stories it had, if the person
> was an employer (and if so, how many employees he had) or employee,
> whether he was deaf, dumb, blind, crazy (or "lunatic") and how many
> sheep he has.
>
> One challenge with this sort of data is the document in which it is
> found.  If we treat it as a source (which seems natural to me) then we
> have a problem storing all the bits of data we find for one
> individual.  You can't use source attributes, since those are shared
> in gramps.  We could treat the document as a repository and the
> entries as sources from that repository, but that seems unnatural when
> you are looking at a book or a bible or a microfilm.

There was a long discussion very much along these lines some
months ago on the users' list.
It seems to me that there's a need to have the possibility
of recognising some grouping of sources at a level between
the individual source and their current geographical
location. Sometimes it's natural, say, someone's personal
library housed within a national library. Other times it's
certainly a bit artificial, like a calling an LDS film a
repository, although we can sort of get round that by
specifying the type of repository to be Microfilm - for me,
not so much a problem as a matter of taste.

Also, since a
> repository can't have a repository (i.e. you can't nest them) and the
> real repository (building, web site, etc.) may house many such
> sources, how can you bring all these pseudo-repositories together
> under that real repository?

That's the real problem, I think.
It would make more sense if the address of the repository
was a Place.
Then there would be no difficulty in having many
repositories sharing the same Place (geographical location).
Also if a repository were moved - say, a private library to
another institution - only the Place of the repository need
be changed, instead of needing to create and enter a new
repository for all the sources located there and having to
delete the previous one.


>
> Another challenge is all the bits of data we find in this document.
> Some should no doubt find there way to other objects: the cause of
> death to a death event attribute;

  the witness names to a marriage
> event attribute;

I don't understand. Doesn't the marriage event with its
attributes and references collect all the information in one
place? What did you have in mind?


Doug

the house construction and size to a residence

> attribute, perhaps.  However, it is still good (very good, I feel) to
> also keep all this data together and tie it to the source in which it
> is found.
>
> The Census gramplet addresses this by using event reference
> attributes.  So the fact that the person was a lunatic is recorded in
> an attribute of the event reference -- Lunatic: yes -- and similarly,
> the other attributes.  This solves the immediate problem for censuses
> but may not be generally extensible to other documents -- especially
> if there are multiple documents for some event that disagree with each
> other.  Furthermore, it is not clear to me that this is the best way
> to handle this data since the data is not really an attribute of the
> event but of the source document since the data was recorded in the
> source document at the time of the event.
>
> I'm now wondering if we should add attributes to source references
> analogous to event references.  If available, this would be a natural
> place to store all the bits of data for each entry while keeping one
> source object (the book, film, etc.) at the repositories where it can
> be found. On the other hand, introducing source reference attributes
> may introduce challenges for GEDCOM exports and imports.
>
> So let's discuss!  What creative ways can we devise to handle these
> sorts of source documents?  Should we extend our data model and, if
> so, how?  If we extend the data model, what are the repercussions?
>




------------------------------------------------------------------------------
Increase Visibility of Your 3D Game App & Earn a Chance To Win $500!
Tap into the largest installed PC base & get more eyes on your game by
optimizing for Intel(R) Graphics Technology. Get started today with the
Intel(R) Software Partner Program. Five $500 cash prizes are up for grabs.
http://p.sf.net/sfu/intelisp-dev2dev
_______________________________________________
Gramps-devel mailing list
[hidden email]
https://lists.sourceforge.net/lists/listinfo/gramps-devel
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

Re: Storing data from large sources

Gerald Britton-2
In reply to this post by Benny Malengier
On Tue, Nov 30, 2010 at 3:48 AM, Benny Malengier
<[hidden email]> wrote:

>
>
> 2010/11/29 Gerald Britton <[hidden email]>
>>
>> I want to open up a discussion about how best to store data from large
>> sources.  By "large" I mean sources such as registers, logs, family
>> bibles, censuses, member lists and other things that contain many
>> entries (millions, in the case of a census).  Usually, each entry in
>> such a document has several columns of data.  For example, a marriage
>> register at a church will have names of the bride, groom, witnesses
>> and possibly parents, date, officiating minister's name and other
>> things.  A death register may include cause of death, place and other
>> things.  A census may have all sorts of data, including whether the
>> house was brick or frame and how many stories it had, if the person
>> was an employer (and if so, how many employees he had) or employee,
>> whether he was deaf, dumb, blind, crazy (or "lunatic") and how many
>> sheep he has.
>>
>> One challenge with this sort of data is the document in which it is
>> found.  If we treat it as a source (which seems natural to me) then we
>> have a problem storing all the bits of data we find for one
>> individual.  You can't use source attributes, since those are shared
>> in gramps.  We could treat the document as a repository and the
>> entries as sources from that repository, but that seems unnatural when
>> you are looking at a book or a bible or a microfilm.  Also, since a
>> repository can't have a repository (i.e. you can't nest them) and the
>> real repository (building, web site, etc.) may house many such
>> sources, how can you bring all these pseudo-repositories together
>> under that real repository?
>>
>> Another challenge is all the bits of data we find in this document.
>> Some should no doubt find there way to other objects: the cause of
>> death to a death event attribute; the witness names to a marriage
>> event attribute; the house construction and size to a residence
>> attribute, perhaps.  However, it is still good (very good, I feel) to
>> also keep all this data together and tie it to the source in which it
>> is found.
>>
>> The Census gramplet addresses this by using event reference
>> attributes.  So the fact that the person was a lunatic is recorded in
>> an attribute of the event reference -- Lunatic: yes -- and similarly,
>> the other attributes.  This solves the immediate problem for censuses
>> but may not be generally extensible to other documents -- especially
>> if there are multiple documents for some event that disagree with each
>> other.  Furthermore, it is not clear to me that this is the best way
>> to handle this data since the data is not really an attribute of the
>> event but of the source document since the data was recorded in the
>> source document at the time of the event.
>>
>> I'm now wondering if we should add attributes to source references
>> analogous to event references.  If available, this would be a natural
>> place to store all the bits of data for each entry while keeping one
>> source object (the book, film, etc.) at the repositories where it can
>> be found. On the other hand, introducing source reference attributes
>> may introduce challenges for GEDCOM exports and imports.
>>
>> So let's discuss!  What creative ways can we devise to handle these
>> sorts of source documents?  Should we extend our data model and, if
>> so, how?  If we extend the data model, what are the repercussions?
>>
>
> For me,
>
> 1. Repository is where you find a source. We should not misuse it

Yup,  I use it the same way

>
> 2. Source is the the book/registery, or a part of it. The source holds
> information, and literal transcripts of a source should hence be stored in
> this object.
> A source does _not_ have what we call attributes, a source has "Data". This
> is not exported to GEDCOM. The Data is not shared, the source is what is
> shared.

Yup

>
> 3. An event is something happening to a person/family at a certain
> time/place. Census event is the census taker that passes and writes info in
> the census source.

exactly

>
> 4. You learn from a source information about a person or family, so you want
> to add information about the person/family in the person/family object. You
> add this information, eg an attribute: Description, Blue eyes.  Source of
> this attribute is the census souce.

Yes, but I think it is good to keep the information that you find in
one place as well.  This is what the census gramplet does and it is
very useful, since the data extracted from the census lives
independently.

>
> I don't see problems here, except for the fact that you can only store the
> census data in the source as a note if you want it stored. So there is no
> 'database scheme' for it. You can use Source Data for key-value pairs.

Except that it rapidly becomes unmanageable.  A census is a singe
source with millions of entries, each of which will have tens of
attributes.  Source data key/value pairs are completely insufficient
for this purpose.  You would have to make up keys that include some
identification information for each census entry you want to record.
So, for some census, you would have:

JohnDoe_House: brick
NancyDrew_House: stone

etc.  Since you might have hundreds of members of your family tree in
the census, this is clearly unworkable.  Unless you abuse the notion
of a Source and make one up for every census entry, that is.  (A
method I cannot accept)

This is exactly where my suggestion to create Source Reference
attributes arises.  And, it is not limited to census data.  It applies
equally well to birth, marriage and death registers and many other
similar, bulk sources, that contain many columns of data on hundreds,
thousands or even millions of individuals.

>
> Now, the other way around. You have a person, and you see a source saying
> green eyes. You go to attributes and you see blue eyes. You wonder if there
> is no error. You click on the attribute to from what source you have this
> information, you open the census source, and you look at the data inside of
> it. If you used a note for the data in the census, you can share it in the
> source reference, and you know what the census said. If you are uncertain
> and you want to recheck the census, you go to the repository tab and you see
> where this census is stored to check in the repository the source again.
>
> So, In all this, you normally _don't_ check the census event! It seems
> stupid to me to store data obtained in the census taking there. At most, I
> would share a note with the transcript there.

Difficult to search, impossible to compare, except for the most basic
data.  Remember that I have censuses with up to 140 key/value pairs
per individual.

>
> So, in my view, the way census gramplet works is wrong.

I think it works the best that it can with the current schema.  If we
had Source Reference Attributes or something similar, we could put the
data there instead.

>
> Benny
>
> --
>>
>> Gerald Britton
>>
>>
>> ------------------------------------------------------------------------------
>> Increase Visibility of Your 3D Game App & Earn a Chance To Win $500!
>> Tap into the largest installed PC base & get more eyes on your game by
>> optimizing for Intel(R) Graphics Technology. Get started today with the
>> Intel(R) Software Partner Program. Five $500 cash prizes are up for grabs.
>> http://p.sf.net/sfu/intelisp-dev2dev
>> _______________________________________________
>> Gramps-devel mailing list
>> [hidden email]
>> https://lists.sourceforge.net/lists/listinfo/gramps-devel
>
>



--
Gerald Britton

------------------------------------------------------------------------------
Increase Visibility of Your 3D Game App & Earn a Chance To Win $500!
Tap into the largest installed PC base & get more eyes on your game by
optimizing for Intel(R) Graphics Technology. Get started today with the
Intel(R) Software Partner Program. Five $500 cash prizes are up for grabs.
http://p.sf.net/sfu/intelisp-dev2dev
_______________________________________________
Gramps-devel mailing list
[hidden email]
https://lists.sourceforge.net/lists/listinfo/gramps-devel
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

Re: Storing data from large sources

Tim Lyons
Administrator
Gerald Britton-2 wrote
Except that it rapidly becomes unmanageable.  A census is a singe
source with millions of entries, each of which will have tens of
attributes.  Source data key/value pairs are completely insufficient
for this purpose.  You would have to make up keys that include some
identification information for each census entry you want to record.
So, for some census, you would have:

JohnDoe_House: brick
NancyDrew_House: stone

etc.  Since you might have hundreds of members of your family tree in
the census, this is clearly unworkable.  Unless you abuse the notion
of a Source and make one up for every census entry, that is.  (A
method I cannot accept)

This is exactly where my suggestion to create Source Reference
attributes arises.  And, it is not limited to census data.  It applies
equally well to birth, marriage and death registers and many other
similar, bulk sources, that contain many columns of data on hundreds,
thousands or even millions of individuals.
 --<snip>
Difficult to search, impossible to compare, except for the most basic
data.  Remember that I have censuses with up to 140 key/value pairs
per individual.

I want to argue very strongly for the addition of media and if wanted attributes to Source References, and for Source References to become first class objects so that they can be shared.

At present if you treat a "large source" as a single source, then you have problems with managing (manually and by conventions) the components of that source. You can store transcripts of parts of the source in Source References, but they are copied on each object that refers to them so are independent, which causes problems for finding them and updating them. You can't store attributes, and you can't store media on the source reference. Storing a transcript on a source reference as a shared note does avoid some problems, but the information on the source reference is still independent, so you can get different representations of the page number in what should be the same source reference. As Gerard says, using source data for key value pairs rapidly becomes unmanageable, and the data he wants to store are actually properties of the source reference.

At present, if you treat each component as a separate source, then you have an unnatural breakdown that causes difficulties in managing the links to repositories. It also does not respect the Page number that is a built-in property of source references. I agree with Benny that a source is something like a book, so this is not the appropriate model. As Gerard says "Unless you abuse the notion of a Source and make one up for every census entry, that is.  (A method I cannot accept)"

If you change Gramps so that you treat a 'large source' as a single source, but allow media links and if you want, attributes, and they are shared, then you don't need any conventions as to how to manage things like pages from that source. Each separate thing that is referenced is a separate source reference, and these can be shared.

One of the points Gerard made in his earlier postings was that limitations arose from our adherence to GEDCOM. However, in fact, GEDCOM provides both multi-media links as well as notes in Source Citations, which are the exact equivalent of Source References. I'm afraid I don't understand the distinction that is being drawn between 'attributes' and 'data' - it seems to me to be rather artificial. I don't think it's important from a theoretical point of view whether the things you use for storage in a source reference are notes, or attributes or data. Whatever is most convenient for the user to use. Similarly, I don't think it is important that 'Data' is not exported to GEDCOM. This is surely just a feature of the current code. GEDCOM source citations have notes and 'text from source', so I would expect data, attributes and notes all to be exported to suitable notes within GEDCOM in due course.

The fact that there have been many previous discussions along these lines in the Gramps lists does seem to me to indicate that there is something missing. I think that shared source references with media and attributes would meet most of the needs.

I would hope that making source references shared would be more or less transparent to most of the screens and reports in Gramps. Obviously you would need a new category of display, and you would need to allow the user to choose an existing source reference as well as create a new one. Technically you would indeed need a source reference reference, but that is only a technically awkward name. I imagine that one could even hide the link to the intermediate source reference object so that a prototype of shared source reference objects might not even need to change most of the rest of the code which could continue to refer to a source reference and get to it through the s-r-r transparently - just a suggestion - I don't know enough about it to know whether it is really feasible or even desirable.
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

Re: Storing data from large sources

Nick Hall-6


Tim Lyons wrote:

> Gerald Britton-2 wrote:
>  
>> Except that it rapidly becomes unmanageable.  A census is a singe
>> source with millions of entries, each of which will have tens of
>> attributes.  Source data key/value pairs are completely insufficient
>> for this purpose.  You would have to make up keys that include some
>> identification information for each census entry you want to record.
>> So, for some census, you would have:
>>
>> JohnDoe_House: brick
>> NancyDrew_House: stone
>>
>> etc.  Since you might have hundreds of members of your family tree in
>> the census, this is clearly unworkable.  Unless you abuse the notion
>> of a Source and make one up for every census entry, that is.  (A
>> method I cannot accept)
>>
>> This is exactly where my suggestion to create Source Reference
>> attributes arises.  And, it is not limited to census data.  It applies
>> equally well to birth, marriage and death registers and many other
>> similar, bulk sources, that contain many columns of data on hundreds,
>> thousands or even millions of individuals.
>>  --<snip>
>> Difficult to search, impossible to compare, except for the most basic
>> data.  Remember that I have censuses with up to 140 key/value pairs
>> per individual.
>>
>>    
>
>
> I want to argue very strongly for the addition of media and if wanted
> attributes to Source References, and for Source References to become first
> class objects so that they can be shared.
>  

Yes, I have often thought that this would be a good idea.

You are suggesting that we convert existing Source References into new
primary "Citation" objects. Where we now have a Source Reference, it
would become a reference to a Citation. A Citation would contain a
reference to a Source.

A Citation object could contain Attributes and Media objects.

We would create a Citation View, Editor and Selector. Existing editors
that create/edit/delete a Source Reference would need updating to
add/select/remove a Citation.

I think that this has been discussed and rejected in the past, but I'm
not sure why. The Citation table would be large, which may have been a
factor. Does anyone know anything about this?


Nick.


> At present if you treat a "large source" as a single source, then you have
> problems with managing (manually and by conventions) the components of that
> source. You can store transcripts of parts of the source in Source
> References, but they are copied on each object that refers to them so are
> independent, which causes problems for finding them and updating them. You
> can't store attributes, and you can't store media on the source reference.
> Storing a transcript on a source reference as a shared note does avoid some
> problems, but the information on the source reference is still independent,
> so you can get different representations of the page number in what should
> be the same source reference. As Gerard says, using source data for key
> value pairs rapidly becomes unmanageable, and the data he wants to store are
> actually properties of the source reference.
>
> At present, if you treat each component as a separate source, then you have
> an unnatural breakdown that causes difficulties in managing the links to
> repositories. It also does not respect the Page number that is a built-in
> property of source references. I agree with Benny that a source is something
> like a book, so this is not the appropriate model. As Gerard says "Unless
> you abuse the notion of a Source and make one up for every census entry,
> that is.  (A method I cannot accept)"
>
> If you change Gramps so that you treat a 'large source' as a single source,
> but allow media links and if you want, attributes, and they are shared, then
> you don't need any conventions as to how to manage things like pages from
> that source. Each separate thing that is referenced is a separate source
> reference, and these can be shared.
>
> One of the points Gerard made in his earlier postings was that limitations
> arose from our adherence to GEDCOM. However, in fact, GEDCOM provides both
> multi-media links as well as notes in Source Citations, which are the exact
> equivalent of Source References. I'm afraid I don't understand the
> distinction that is being drawn between 'attributes' and 'data' - it seems
> to me to be rather artificial. I don't think it's important from a
> theoretical point of view whether the things you use for storage in a source
> reference are notes, or attributes or data. Whatever is most convenient for
> the user to use. Similarly, I don't think it is important that 'Data' is not
> exported to GEDCOM. This is surely just a feature of the current code.
> GEDCOM source citations have notes and 'text from source', so I would expect
> data, attributes and notes all to be exported to suitable notes within
> GEDCOM in due course.
>
> The fact that there have been many previous discussions along these lines in
> the Gramps lists does seem to me to indicate that there is something
> missing. I think that shared source references with media and attributes
> would meet most of the needs.
>
> I would hope that making source references shared would be more or less
> transparent to most of the screens and reports in Gramps. Obviously you
> would need a new category of display, and you would need to allow the user
> to choose an existing source reference as well as create a new one.
> Technically you would indeed need a source reference reference, but that is
> only a technically awkward name. I imagine that one could even hide the link
> to the intermediate source reference object so that a prototype of shared
> source reference objects might not even need to change most of the rest of
> the code which could continue to refer to a source reference and get to it
> through the s-r-r transparently - just a suggestion - I don't know enough
> about it to know whether it is really feasible or even desirable.
>  

------------------------------------------------------------------------------
Increase Visibility of Your 3D Game App & Earn a Chance To Win $500!
Tap into the largest installed PC base & get more eyes on your game by
optimizing for Intel(R) Graphics Technology. Get started today with the
Intel(R) Software Partner Program. Five $500 cash prizes are up for grabs.
http://p.sf.net/sfu/intelisp-dev2dev
_______________________________________________
Gramps-devel mailing list
[hidden email]
https://lists.sourceforge.net/lists/listinfo/gramps-devel
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

Re: Storing data from large sources

Benny Malengier


2010/12/2 Nick Hall <[hidden email]>


Tim Lyons wrote:
> Gerald Britton-2 wrote:
>
>> Except that it rapidly becomes unmanageable.  A census is a singe
>> source with millions of entries, each of which will have tens of
>> attributes.  Source data key/value pairs are completely insufficient
>> for this purpose.  You would have to make up keys that include some
>> identification information for each census entry you want to record.
>> So, for some census, you would have:
>>
>> JohnDoe_House: brick
>> NancyDrew_House: stone
>>
>> etc.  Since you might have hundreds of members of your family tree in
>> the census, this is clearly unworkable.  Unless you abuse the notion
>> of a Source and make one up for every census entry, that is.  (A
>> method I cannot accept)
>>
>> This is exactly where my suggestion to create Source Reference
>> attributes arises.  And, it is not limited to census data.  It applies
>> equally well to birth, marriage and death registers and many other
>> similar, bulk sources, that contain many columns of data on hundreds,
>> thousands or even millions of individuals.
>>  --<snip>
>> Difficult to search, impossible to compare, except for the most basic
>> data.  Remember that I have censuses with up to 140 key/value pairs
>> per individual.
>>
>>
>
>
> I want to argue very strongly for the addition of media and if wanted
> attributes to Source References, and for Source References to become first
> class objects so that they can be shared.
>

Yes, I have often thought that this would be a good idea.

And I think i is a very, very, very bad idea.
I think many of us mean the same thing but use different words and use some equal words for different things, this does not allow discussion, so let's define a unique vocabulary first, and stick to it for the discussion.

Let me try to make some things clear.
You have a person, and you have a thing to store data of a source. Between the two, you will always have something to store data about the relationship between the two. At least that is how I was taught it when doing my master in applied informatics, and how I have seen it used everywhere where I worked. That is what is now source reference in Gramps. As a logical conclusion, if we make source reference shared, we need a new unique non shared object between the two. A source reference reference? ;-)
Anyway, the word reference in Gramps _always_ indicates the relationship information between two objects, so don't use the word for things you want as core object!

The discussion will always return to the same thing, how to handle large sources in Gramps. I already discussed that when I started using Gramps, but the admins then did not really think the problem required changes.
In my book, best is to mimic as close as possible (with minimum of complexity) the reality. So, think outside the box, and change how some objects behave.

My suggestion would be:

Source (Data=publication information) --> Source Content <----- source-object-relation ---> object

Our source-object-relation is the object storing the unique relationship, and we call that at present "source reference".

The question then becomes:
1/ is above suggestion good enough?
2/ what data must be stored under which object? Eg, does it really make sense to store media in the relationship object? Is pagenumber not a part of the source content object? Should there still be something stored under the source-reference?

What Tim calls a shared source reference is what I would call source content. What Nick calls Citation I call Source Content. If we list what attributes should be stored under which object, it will be more clear what the best name for the object is.

One can make things more complicated with an object model as:

Source (Data=publication information) --> Source Content <----- deduction-process ----> Information <--- information-object-relation ---> object

But I do not think our users would have the time to actually work like that. An important constraint for Gramps is that it must still be easy to create a family tree, even if that means breaking away from the possibility to mimic reality as close as possible.

I don't mind changes to the core of Gramps, but they must be _completely_ worked out so we are certain things are sound. So let's discuss, but in the end, first a GEP must be made with a full documentation of all the changes to make working with large sources a joy in Gramps.

So, next step would be somebody listing the objects above, and indicating what of our present fields go with what object, and what possible extra fields are needed.

To end the definition of vocabulary, Attributes are Data with a source (and notes, but less imporant). Data are key,value pairs. As a source does not link to another source, it has Data, not Attributes. Feel free to come up with better names.

Benny


You are suggesting that we convert existing Source References into new
primary "Citation" objects. Where we now have a Source Reference, it
would become a reference to a Citation. A Citation would contain a
reference to a Source.

A Citation object could contain Attributes and Media objects.

We would create a Citation View, Editor and Selector. Existing editors
that create/edit/delete a Source Reference would need updating to
add/select/remove a Citation.

I think that this has been discussed and rejected in the past, but I'm
not sure why. The Citation table would be large, which may have been a
factor. Does anyone know anything about this?


Nick.


> At present if you treat a "large source" as a single source, then you have
> problems with managing (manually and by conventions) the components of that
> source. You can store transcripts of parts of the source in Source
> References, but they are copied on each object that refers to them so are
> independent, which causes problems for finding them and updating them. You
> can't store attributes, and you can't store media on the source reference.
> Storing a transcript on a source reference as a shared note does avoid some
> problems, but the information on the source reference is still independent,
> so you can get different representations of the page number in what should
> be the same source reference. As Gerard says, using source data for key
> value pairs rapidly becomes unmanageable, and the data he wants to store are
> actually properties of the source reference.
>
> At present, if you treat each component as a separate source, then you have
> an unnatural breakdown that causes difficulties in managing the links to
> repositories. It also does not respect the Page number that is a built-in
> property of source references. I agree with Benny that a source is something
> like a book, so this is not the appropriate model. As Gerard says "Unless
> you abuse the notion of a Source and make one up for every census entry,
> that is.  (A method I cannot accept)"
>
> If you change Gramps so that you treat a 'large source' as a single source,
> but allow media links and if you want, attributes, and they are shared, then
> you don't need any conventions as to how to manage things like pages from
> that source. Each separate thing that is referenced is a separate source
> reference, and these can be shared.
>
> One of the points Gerard made in his earlier postings was that limitations
> arose from our adherence to GEDCOM. However, in fact, GEDCOM provides both
> multi-media links as well as notes in Source Citations, which are the exact
> equivalent of Source References. I'm afraid I don't understand the
> distinction that is being drawn between 'attributes' and 'data' - it seems
> to me to be rather artificial. I don't think it's important from a
> theoretical point of view whether the things you use for storage in a source
> reference are notes, or attributes or data. Whatever is most convenient for
> the user to use. Similarly, I don't think it is important that 'Data' is not
> exported to GEDCOM. This is surely just a feature of the current code.
> GEDCOM source citations have notes and 'text from source', so I would expect
> data, attributes and notes all to be exported to suitable notes within
> GEDCOM in due course.
>
> The fact that there have been many previous discussions along these lines in
> the Gramps lists does seem to me to indicate that there is something
> missing. I think that shared source references with media and attributes
> would meet most of the needs.
>
> I would hope that making source references shared would be more or less
> transparent to most of the screens and reports in Gramps. Obviously you
> would need a new category of display, and you would need to allow the user
> to choose an existing source reference as well as create a new one.
> Technically you would indeed need a source reference reference, but that is
> only a technically awkward name. I imagine that one could even hide the link
> to the intermediate source reference object so that a prototype of shared
> source reference objects might not even need to change most of the rest of
> the code which could continue to refer to a source reference and get to it
> through the s-r-r transparently - just a suggestion - I don't know enough
> about it to know whether it is really feasible or even desirable.
>

------------------------------------------------------------------------------
Increase Visibility of Your 3D Game App & Earn a Chance To Win $500!
Tap into the largest installed PC base & get more eyes on your game by
optimizing for Intel(R) Graphics Technology. Get started today with the
Intel(R) Software Partner Program. Five $500 cash prizes are up for grabs.
http://p.sf.net/sfu/intelisp-dev2dev
_______________________________________________
Gramps-devel mailing list
[hidden email]
https://lists.sourceforge.net/lists/listinfo/gramps-devel


------------------------------------------------------------------------------
Increase Visibility of Your 3D Game App & Earn a Chance To Win $500!
Tap into the largest installed PC base & get more eyes on your game by
optimizing for Intel(R) Graphics Technology. Get started today with the
Intel(R) Software Partner Program. Five $500 cash prizes are up for grabs.
http://p.sf.net/sfu/intelisp-dev2dev
_______________________________________________
Gramps-devel mailing list
[hidden email]
https://lists.sourceforge.net/lists/listinfo/gramps-devel
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

Re: Storing data from large sources

Nick Hall-6


Benny Malengier wrote:

>
>
> 2010/12/2 Nick Hall <[hidden email]
> <mailto:[hidden email]>>
>
>
>
>     Tim Lyons wrote:
>     > Gerald Britton-2 wrote:
>     >
>     >> Except that it rapidly becomes unmanageable.  A census is a singe
>     >> source with millions of entries, each of which will have tens of
>     >> attributes.  Source data key/value pairs are completely
>     insufficient
>     >> for this purpose.  You would have to make up keys that include some
>     >> identification information for each census entry you want to
>     record.
>     >> So, for some census, you would have:
>     >>
>     >> JohnDoe_House: brick
>     >> NancyDrew_House: stone
>     >>
>     >> etc.  Since you might have hundreds of members of your family
>     tree in
>     >> the census, this is clearly unworkable.  Unless you abuse the
>     notion
>     >> of a Source and make one up for every census entry, that is.  (A
>     >> method I cannot accept)
>     >>
>     >> This is exactly where my suggestion to create Source Reference
>     >> attributes arises.  And, it is not limited to census data.  It
>     applies
>     >> equally well to birth, marriage and death registers and many other
>     >> similar, bulk sources, that contain many columns of data on
>     hundreds,
>     >> thousands or even millions of individuals.
>     >>  --<snip>
>     >> Difficult to search, impossible to compare, except for the most
>     basic
>     >> data.  Remember that I have censuses with up to 140 key/value pairs
>     >> per individual.
>     >>
>     >>
>     >
>     >
>     > I want to argue very strongly for the addition of media and if
>     wanted
>     > attributes to Source References, and for Source References to
>     become first
>     > class objects so that they can be shared.
>     >
>
>     Yes, I have often thought that this would be a good idea.
>
>
> And I think i is a very, very, very bad idea.
> I think many of us mean the same thing but use different words and use
> some equal words for different things, this does not allow discussion,
> so let's define a unique vocabulary first, and stick to it for the
> discussion.
>
> Let me try to make some things clear.
> You have a person, and you have a thing to store data of a source.
> Between the two, you will always have something to store data about
> the relationship between the two. At least that is how I was taught it
> when doing my master in applied informatics, and how I have seen it
> used everywhere where I worked. That is what is now source reference
> in Gramps. As a logical conclusion, if we make source reference
> shared, we need a new unique non shared object between the two. A
> source reference reference? ;-)
> Anyway, the word reference in Gramps _always_ indicates the
> relationship information between two objects, so don't use the word
> for things you want as core object!
>
> The discussion will always return to the same thing, how to handle
> large sources in Gramps. I already discussed that when I started using
> Gramps, but the admins then did not really think the problem required
> changes.
> In my book, best is to mimic as close as possible (with minimum of
> complexity) the reality. So, think outside the box, and change how
> some objects behave.
>
> My suggestion would be:
>
> Source (Data=publication information) --> Source Content <-----
> source-object-relation ---> object
>
> Our source-object-relation is the object storing the unique
> relationship, and we call that at present "source reference".
>
> The question then becomes:
> 1/ is above suggestion good enough?
> 2/ what data must be stored under which object? Eg, does it really
> make sense to store media in the relationship object? Is pagenumber
> not a part of the source content object? Should there still be
> something stored under the source-reference?
>
> What Tim calls a shared source reference is what I would call source
> content. What Nick calls Citation I call Source Content. If we list
> what attributes should be stored under which object, it will be more
> clear what the best name for the object is.

Benny,

I think we are all suggesting the same thing.  You have just described
it better, and we have been using different terminology.

The new primary object will be:

SourceContent
  SourceRef
  Date
  Volume/Page
  Note
  Media
  Data

All objects that contain SourceRef will be changed to contain
SourceContentRef:

SourceContentRef
  Confidence
  Note

SourceRef will become the reference between a SourceContent and a Source
object.  I don't think that it will have any content except the source
handle.

I think that I would still call "Source Content", "Citation".  :)

Creating a SourceContent view would be easy.   The SourceContent
selector should always display a source title to provide context.

The extra level could be tedious for users when entering a new source.  
We would have to be careful about the design of the editors.

Would this extra level be confusing for Aunt Martha?  A user could
create a SourceContent without a Source.

The data stored in the SourceContent object may change with GEPS 018:
Evidence style sources.

Has this been discussed before?


Nick.


>
> One can make things more complicated with an object model as:
>
> Source (Data=publication information) --> Source Content <-----
> deduction-process ----> Information <--- information-object-relation
> ---> object
>
> But I do not think our users would have the time to actually work like
> that. An important constraint for Gramps is that it must still be easy
> to create a family tree, even if that means breaking away from the
> possibility to mimic reality as close as possible.
>
> I don't mind changes to the core of Gramps, but they must be
> _completely_ worked out so we are certain things are sound. So let's
> discuss, but in the end, first a GEP must be made with a full
> documentation of all the changes to make working with large sources a
> joy in Gramps.
>
> So, next step would be somebody listing the objects above, and
> indicating what of our present fields go with what object, and what
> possible extra fields are needed.
>
> To end the definition of vocabulary, Attributes are Data with a source
> (and notes, but less imporant). Data are key,value pairs. As a source
> does not link to another source, it has Data, not Attributes. Feel
> free to come up with better names.
>
> Benny
>
>
>     You are suggesting that we convert existing Source References into new
>     primary "Citation" objects. Where we now have a Source Reference, it
>     would become a reference to a Citation. A Citation would contain a
>     reference to a Source.
>
>     A Citation object could contain Attributes and Media objects.
>
>     We would create a Citation View, Editor and Selector. Existing editors
>     that create/edit/delete a Source Reference would need updating to
>     add/select/remove a Citation.
>
>     I think that this has been discussed and rejected in the past, but I'm
>     not sure why. The Citation table would be large, which may have been a
>     factor. Does anyone know anything about this?
>
>
>     Nick.
>
>
>     > At present if you treat a "large source" as a single source,
>     then you have
>     > problems with managing (manually and by conventions) the
>     components of that
>     > source. You can store transcripts of parts of the source in Source
>     > References, but they are copied on each object that refers to
>     them so are
>     > independent, which causes problems for finding them and updating
>     them. You
>     > can't store attributes, and you can't store media on the source
>     reference.
>     > Storing a transcript on a source reference as a shared note does
>     avoid some
>     > problems, but the information on the source reference is still
>     independent,
>     > so you can get different representations of the page number in
>     what should
>     > be the same source reference. As Gerard says, using source data
>     for key
>     > value pairs rapidly becomes unmanageable, and the data he wants
>     to store are
>     > actually properties of the source reference.
>     >
>     > At present, if you treat each component as a separate source,
>     then you have
>     > an unnatural breakdown that causes difficulties in managing the
>     links to
>     > repositories. It also does not respect the Page number that is a
>     built-in
>     > property of source references. I agree with Benny that a source
>     is something
>     > like a book, so this is not the appropriate model. As Gerard
>     says "Unless
>     > you abuse the notion of a Source and make one up for every
>     census entry,
>     > that is.  (A method I cannot accept)"
>     >
>     > If you change Gramps so that you treat a 'large source' as a
>     single source,
>     > but allow media links and if you want, attributes, and they are
>     shared, then
>     > you don't need any conventions as to how to manage things like
>     pages from
>     > that source. Each separate thing that is referenced is a
>     separate source
>     > reference, and these can be shared.
>     >
>     > One of the points Gerard made in his earlier postings was that
>     limitations
>     > arose from our adherence to GEDCOM. However, in fact, GEDCOM
>     provides both
>     > multi-media links as well as notes in Source Citations, which
>     are the exact
>     > equivalent of Source References. I'm afraid I don't understand the
>     > distinction that is being drawn between 'attributes' and 'data'
>     - it seems
>     > to me to be rather artificial. I don't think it's important from a
>     > theoretical point of view whether the things you use for storage
>     in a source
>     > reference are notes, or attributes or data. Whatever is most
>     convenient for
>     > the user to use. Similarly, I don't think it is important that
>     'Data' is not
>     > exported to GEDCOM. This is surely just a feature of the current
>     code.
>     > GEDCOM source citations have notes and 'text from source', so I
>     would expect
>     > data, attributes and notes all to be exported to suitable notes
>     within
>     > GEDCOM in due course.
>     >
>     > The fact that there have been many previous discussions along
>     these lines in
>     > the Gramps lists does seem to me to indicate that there is something
>     > missing. I think that shared source references with media and
>     attributes
>     > would meet most of the needs.
>     >
>     > I would hope that making source references shared would be more
>     or less
>     > transparent to most of the screens and reports in Gramps.
>     Obviously you
>     > would need a new category of display, and you would need to
>     allow the user
>     > to choose an existing source reference as well as create a new one.
>     > Technically you would indeed need a source reference reference,
>     but that is
>     > only a technically awkward name. I imagine that one could even
>     hide the link
>     > to the intermediate source reference object so that a prototype
>     of shared
>     > source reference objects might not even need to change most of
>     the rest of
>     > the code which could continue to refer to a source reference and
>     get to it
>     > through the s-r-r transparently - just a suggestion - I don't
>     know enough
>     > about it to know whether it is really feasible or even desirable.
>     >
>
>     ------------------------------------------------------------------------------
>     Increase Visibility of Your 3D Game App & Earn a Chance To Win $500!
>     Tap into the largest installed PC base & get more eyes on your game by
>     optimizing for Intel(R) Graphics Technology. Get started today
>     with the
>     Intel(R) Software Partner Program. Five $500 cash prizes are up
>     for grabs.
>     http://p.sf.net/sfu/intelisp-dev2dev
>     _______________________________________________
>     Gramps-devel mailing list
>     [hidden email]
>     <mailto:[hidden email]>
>     https://lists.sourceforge.net/lists/listinfo/gramps-devel
>
>

------------------------------------------------------------------------------
Increase Visibility of Your 3D Game App & Earn a Chance To Win $500!
Tap into the largest installed PC base & get more eyes on your game by
optimizing for Intel(R) Graphics Technology. Get started today with the
Intel(R) Software Partner Program. Five $500 cash prizes are up for grabs.
http://p.sf.net/sfu/intelisp-dev2dev
_______________________________________________
Gramps-devel mailing list
[hidden email]
https://lists.sourceforge.net/lists/listinfo/gramps-devel
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

Re: Storing data from large sources

Benny Malengier


2010/12/2 Nick Hall <[hidden email]>
Benny,

I think we are all suggesting the same thing.  You have just described it better, and we have been using different terminology.

The new primary object will be:

SourceContent
 SourceRef
 Date
 Volume/Page
 Note
 Media
 Data

All objects that contain SourceRef will be changed to contain SourceContentRef:

SourceContentRef
 Confidence
 Note

SourceRef will become the reference between a SourceContent and a Source object.  I don't think that it will have any content except the source handle.

I think that I would still call "Source Content", "Citation".  :)

Creating a SourceContent view would be easy.   The SourceContent selector should always display a source title to provide context.

The extra level could be tedious for users when entering a new source.  We would have to be careful about the design of the editors.

Would this extra level be confusing for Aunt Martha?  A user could create a SourceContent without a Source.

The data stored in the SourceContent object may change with GEPS 018: Evidence style sources.

Has this been discussed before?

Many emails, no design.

We could make things more versatile and consider the informtion learned from a source as an object.
So
1 Source can have N SourceContent
1 Information can refer to several SourceContent
1 object can have several Informations.

Like this, Information is like a citation, but also holding the conclusion itself.

Anyway, also complicated. To go to your design, I don't think SourceContent must be presented in the interface as a core object. Instead a treeview Source-SourceContent seems more natural to me. I would do the attributes different.

Source
   1 Title
   1 Author
   1 Gramps ID
   1 Abbr
   1 Pulication Information
   1 Global Confidence
   n Publication Data (key value pairs, eg Publication Date, Publisher, ...)
   n MediaRef (Region, Src, attr, notes)  --> Media
   n RepoRef (Type, Callnumber)           --> Repo

SourceContent
    1 Source (GrampsID)
    1 Confidence (5 values)
    1 Volume
    1 Page
    1 LogDate
    1 Linenumber
    1 Position (eg. Upper Left Corner of image)
    n Information (key, value pairs, current Data)
    n NoteIds
    n MediaRef (Region, Src, attr, notes)  --> Media

SourceContentRef (called Citation in the interface, part of objects with sources)
  1 Type: Transcript or Deduction
  1 Deduction Confidence (5 values)
  1 Argumentation (one line string)
  n Note


So, in this design, one must envision that Source and sourcecontent form one single editor. The source editor would have a list of sourcecontent, and if you select one, you see to the right the detail about that content.
When adding a citation to eg a person, you obtain a treeview source-sourcecontent, so if you select a census entry, you immediately see the data of that entry you stored.

Confidence is given globally, of the content (same as globally by default), and of the deduction. The SourceContentRef is there to hold the process of deducing information you add to eg a person as coming from a source. In many cases, a pure transcript of the source is done, and no deduction happens, in which case this object contains nothing of interest. If one however makes a deduction, then one can store this here specifically. Eg you find the name Nic__ where you cannot make out what the last hand written letters are, and you save the name as Nick, with reference to this source. Then the sourcereference can indicate why you decide to use Nick and not eg Nicki.

Time for somebody else to refine/change things.

Benny


------------------------------------------------------------------------------
Increase Visibility of Your 3D Game App & Earn a Chance To Win $500!
Tap into the largest installed PC base & get more eyes on your game by
optimizing for Intel(R) Graphics Technology. Get started today with the
Intel(R) Software Partner Program. Five $500 cash prizes are up for grabs.
http://p.sf.net/sfu/intelisp-dev2dev
_______________________________________________
Gramps-devel mailing list
[hidden email]
https://lists.sourceforge.net/lists/listinfo/gramps-devel
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

Re: Storing data from large sources

Tim Lyons
Administrator
Thanks for your suggestions Benny, I think we may be moving towards a consensus. I have created a GEPS to outline a change.
Benny Malengier wrote
I don't think SourceContent
must be presented in the interface as a core object. Instead a treeview
Source-SourceContent seems more natural to me.
Thanks for the suggestion. A treeview for the Source View and the selector works well.
Benny Malengier wrote
I would do the attributes
different.

Source
   1 Title
   1 Author
   1 Gramps ID
   1 Abbr
   1 Pulication Information
   1 Global Confidence
   n Publication Data (key value pairs, eg Publication Date, Publisher, ...)
   n MediaRef (Region, Src, attr, notes)  --> Media
   n RepoRef (Type, Callnumber)           --> Repo

SourceContent
    1 Source (GrampsID)
    1 Confidence (5 values)
    1 Volume
    1 Page
    1 LogDate
    1 Linenumber
    1 Position (eg. Upper Left Corner of image)
    n Information (key, value pairs, current Data)
    n NoteIds
    n MediaRef (Region, Src, attr, notes)  --> Media
I agree except that I wouldn't remove the Notes field from the Source. This would be too awkward for people who are already using it, and is relevant where the source is not 'large'.

I wonder whether we should keep Volume/Page instead of separate Volume, Page, Linenumber and Position for this enhancement. There is a proposal (GEPS 018) which would change the fields in the SourceContent according to a Source Type.
http://gramps-project.org/wiki/index.php?title=GEPS_018:_Evidence_style_sources
I wonder whether it would be better that we wait for this, rather than changing the fields twice. In any case, there are plenty of cases where breakdowns other than the proposed one are more appropriate.
Benny Malengier wrote
SourceContentRef (called Citation in the interface, part of objects with
sources)
  1 Type: Transcript or Deduction
  1 Deduction Confidence (5 values)
  1 Argumentation (one line string)
  n Note
I have not included this in the GEPS, because it seems to relate to how deductions are stored, and as such may not be directly related to this enhancement. Also I am concerned that this may make the user experience too complicated. In the GEPS, users who are happy with the existing interface will see little change (change always frightens users); those who want more will be able to use the additional features.
Benny Malengier wrote
So, in this design, one must envision that Source and sourcecontent form one
single editor.
I agree, having a single editor makes things simpler for the user and ensures that the workflow does not get more complicated.
Benny Malengier wrote
When adding a citation to eg a person, you obtain a treeview
source-sourcecontent, so if you select a census entry, you immediately see
the data of that entry you stored.
I agree - a treeview will make it no more complicated to select an existing Source or SourceContent than it is at present.
Benny Malengier wrote
Confidence is given globally, of the content (same as globally by default),
and of the deduction. The SourceContentRef is there to hold the process of
deducing information you add to eg a person as coming from a source. In many
cases, a pure transcript of the source is done, and no deduction happens, in
which case this object contains nothing of interest. If one however makes a
deduction, then one can store this here specifically. Eg you find the name
Nic__ where you cannot make out what the last hand written letters are, and
you save the name as Nick, with reference to this source. Then the
sourcereference can indicate why you decide to use Nick and not eg Nicki.
As I mentioned, I have not included the fields of a SourceContentRef in the GEPS. They could be added if there is a general desire to do so.


The GEPS is at
http://gramps-project.org/wiki/index.php?title=GEPS_023:_Storing_data_from_large_sources
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

Re: Storing data from large sources

Nick Hall-6
Benny,

Tim has done a lot of work on the GEPS, and it is now at a stage where I
think that it would be helpful if you could review it.

http://gramps-project.org/wiki/index.php?title=GEPS_023:_Storing_data_from_large_sources

My main concern is that the Citation Reference editor may by rather
complicated and large. What do you think?

Could we combine the Type and Deduction Confidence in the citation
reference? A Transcript type would imply a high confidence, whereas a
deduction would be a lower confidence. We could go down to a "Guess"
which would imply a very low confidence.

I didn't use the Source Confidence when I started to use Gramps, because
I was unsure which value to choose. The values in the GEDCOM standard
(Direct/Primary, Secondary, Questionable, Unreliable) are more obvious
how to use than the Gramps values (Very High, High, Normal, Low, Very
Low). Perhaps we could choose more descriptive values for the Deduction
Type?


Nick.



Tim Lyons wrote:

> Thanks for your suggestions Benny, I think we may be moving towards a
> consensus. I have created a GEPS to outline a change.
>
> Benny Malengier wrote:
>  
>> I don't think SourceContent
>> must be presented in the interface as a core object. Instead a treeview
>> Source-SourceContent seems more natural to me.
>>
>>    
> Thanks for the suggestion. A treeview for the Source View and the selector
> works well.
>
> Benny Malengier wrote:
>  
>> I would do the attributes
>> different.
>>
>> Source
>>    1 Title
>>    1 Author
>>    1 Gramps ID
>>    1 Abbr
>>    1 Pulication Information
>>    1 Global Confidence
>>    n Publication Data (key value pairs, eg Publication Date, Publisher,
>> ...)
>>    n MediaRef (Region, Src, attr, notes)  --> Media
>>    n RepoRef (Type, Callnumber)           --> Repo
>>
>> SourceContent
>>     1 Source (GrampsID)
>>     1 Confidence (5 values)
>>     1 Volume
>>     1 Page
>>     1 LogDate
>>     1 Linenumber
>>     1 Position (eg. Upper Left Corner of image)
>>     n Information (key, value pairs, current Data)
>>     n NoteIds
>>     n MediaRef (Region, Src, attr, notes)  --> Media
>>
>>    
> I agree except that I wouldn't remove the Notes field from the Source. This
> would be too awkward for people who are already using it, and is relevant
> where the source is not 'large'.
>
> I wonder whether we should keep Volume/Page instead of separate Volume,
> Page, Linenumber and Position for this enhancement. There is a proposal
> (GEPS 018) which would change the fields in the SourceContent according to a
> Source Type.
> http://gramps-project.org/wiki/index.php?title=GEPS_018:_Evidence_style_sources
> I wonder whether it would be better that we wait for this, rather than
> changing the fields twice. In any case, there are plenty of cases where
> breakdowns other than the proposed one are more appropriate.
>
> Benny Malengier wrote:
>  
>> SourceContentRef (called Citation in the interface, part of objects with
>> sources)
>>   1 Type: Transcript or Deduction
>>   1 Deduction Confidence (5 values)
>>   1 Argumentation (one line string)
>>   n Note
>>
>>    
> I have not included this in the GEPS, because it seems to relate to how
> deductions are stored, and as such may not be directly related to this
> enhancement. Also I am concerned that this may make the user experience too
> complicated. In the GEPS, users who are happy with the existing interface
> will see little change (change always frightens users); those who want more
> will be able to use the additional features.
>
> Benny Malengier wrote:
>  
>> So, in this design, one must envision that Source and sourcecontent form
>> one
>> single editor.
>>
>>    
> I agree, having a single editor makes things simpler for the user and
> ensures that the workflow does not get more complicated.
>
> Benny Malengier wrote:
>  
>> When adding a citation to eg a person, you obtain a treeview
>> source-sourcecontent, so if you select a census entry, you immediately see
>> the data of that entry you stored.
>>
>>    
> I agree - a treeview will make it no more complicated to select an existing
> Source or SourceContent than it is at present.
>
> Benny Malengier wrote:
>  
>> Confidence is given globally, of the content (same as globally by
>> default),
>> and of the deduction. The SourceContentRef is there to hold the process of
>> deducing information you add to eg a person as coming from a source. In
>> many
>> cases, a pure transcript of the source is done, and no deduction happens,
>> in
>> which case this object contains nothing of interest. If one however makes
>> a
>> deduction, then one can store this here specifically. Eg you find the name
>> Nic__ where you cannot make out what the last hand written letters are,
>> and
>> you save the name as Nick, with reference to this source. Then the
>> sourcereference can indicate why you decide to use Nick and not eg Nicki.
>>
>>    
> As I mentioned, I have not included the fields of a SourceContentRef in the
> GEPS. They could be added if there is a general desire to do so.
>
>
> The GEPS is at
> http://gramps-project.org/wiki/index.php?title=GEPS_023:_Storing_data_from_large_sources
>  

------------------------------------------------------------------------------
Lotusphere 2011
Register now for Lotusphere 2011 and learn how
to connect the dots, take your collaborative environment
to the next level, and enter the era of Social Business.
http://p.sf.net/sfu/lotusphere-d2d
_______________________________________________
Gramps-devel mailing list
[hidden email]
https://lists.sourceforge.net/lists/listinfo/gramps-devel
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

Re: Storing data from large sources

jerome
Hi,


Nice work.

I just wonder if to provide the ability to link sources together (source grouping) will not match most cases ? It was the proposal scheme.

sourceref with hlink and group attributes into a source object.

http://gramps-project.org/wiki/index.php?title=Talk:GEPS_023:_Storing_data_from_large_sources

ex: Book/publication/index gives some source references. Primary source will be the Book/publication/index, child references will be the secondary sources. Large data might be stored into multiple sources, which are shared between persons, events, etc ...


Jérôme


--- En date de : Dim 19.12.10, Nick Hall <[hidden email]> a écrit :

> De: Nick Hall <[hidden email]>
> Objet: Re: [Gramps-devel] Storing data from large sources
> À: "Tim Lyons" <[hidden email]>
> Cc: [hidden email]
> Date: Dimanche 19 décembre 2010, 15h19
> Benny,
>
> Tim has done a lot of work on the GEPS, and it is now at a
> stage where I
> think that it would be helpful if you could review it.
>
> http://gramps-project.org/wiki/index.php?title=GEPS_023:_Storing_data_from_large_sources
>
> My main concern is that the Citation Reference editor may
> by rather
> complicated and large. What do you think?
>
> Could we combine the Type and Deduction Confidence in the
> citation
> reference? A Transcript type would imply a high confidence,
> whereas a
> deduction would be a lower confidence. We could go down to
> a "Guess"
> which would imply a very low confidence.
>
> I didn't use the Source Confidence when I started to use
> Gramps, because
> I was unsure which value to choose. The values in the
> GEDCOM standard
> (Direct/Primary, Secondary, Questionable, Unreliable) are
> more obvious
> how to use than the Gramps values (Very High, High, Normal,
> Low, Very
> Low). Perhaps we could choose more descriptive values for
> the Deduction
> Type?
>
>
> Nick.
>
>
>
> Tim Lyons wrote:
> > Thanks for your suggestions Benny, I think we may be
> moving towards a
> > consensus. I have created a GEPS to outline a change.
> >
> > Benny Malengier wrote:
> >   
> >> I don't think SourceContent
> >> must be presented in the interface as a core
> object. Instead a treeview
> >> Source-SourceContent seems more natural to me.
> >>
> >>     
> > Thanks for the suggestion. A treeview for the Source
> View and the selector
> > works well.
> >
> > Benny Malengier wrote:
> >   
> >> I would do the attributes
> >> different.
> >>
> >> Source
> >>    1 Title
> >>    1 Author
> >>    1 Gramps ID
> >>    1 Abbr
> >>    1 Pulication Information
> >>    1 Global Confidence
> >>    n Publication Data (key value pairs,
> eg Publication Date, Publisher,
> >> ...)
> >>    n MediaRef (Region, Src, attr,
> notes)  --> Media
> >>    n RepoRef (Type, Callnumber) 
>          --> Repo
> >>
> >> SourceContent
> >>     1 Source (GrampsID)
> >>     1 Confidence (5 values)
> >>     1 Volume
> >>     1 Page
> >>     1 LogDate
> >>     1 Linenumber
> >>     1 Position (eg. Upper Left
> Corner of image)
> >>     n Information (key, value
> pairs, current Data)
> >>     n NoteIds
> >>     n MediaRef (Region, Src,
> attr, notes)  --> Media
> >>
> >>     
> > I agree except that I wouldn't remove the Notes field
> from the Source. This
> > would be too awkward for people who are already using
> it, and is relevant
> > where the source is not 'large'.
> >
> > I wonder whether we should keep Volume/Page instead of
> separate Volume,
> > Page, Linenumber and Position for this enhancement.
> There is a proposal
> > (GEPS 018) which would change the fields in the
> SourceContent according to a
> > Source Type.
> > http://gramps-project.org/wiki/index.php?title=GEPS_018:_Evidence_style_sources
> > I wonder whether it would be better that we wait for
> this, rather than
> > changing the fields twice. In any case, there are
> plenty of cases where
> > breakdowns other than the proposed one are more
> appropriate.
> >
> > Benny Malengier wrote:
> >   
> >> SourceContentRef (called Citation in the
> interface, part of objects with
> >> sources)
> >>   1 Type: Transcript or Deduction
> >>   1 Deduction Confidence (5
> values)
> >>   1 Argumentation (one line
> string)
> >>   n Note
> >>
> >>     
> > I have not included this in the GEPS, because it seems
> to relate to how
> > deductions are stored, and as such may not be directly
> related to this
> > enhancement. Also I am concerned that this may make
> the user experience too
> > complicated. In the GEPS, users who are happy with the
> existing interface
> > will see little change (change always frightens
> users); those who want more
> > will be able to use the additional features.
> >
> > Benny Malengier wrote:
> >   
> >> So, in this design, one must envision that Source
> and sourcecontent form
> >> one
> >> single editor.
> >>
> >>     
> > I agree, having a single editor makes things simpler
> for the user and
> > ensures that the workflow does not get more
> complicated.
> >
> > Benny Malengier wrote:
> >   
> >> When adding a citation to eg a person, you obtain
> a treeview
> >> source-sourcecontent, so if you select a census
> entry, you immediately see
> >> the data of that entry you stored.
> >>
> >>     
> > I agree - a treeview will make it no more complicated
> to select an existing
> > Source or SourceContent than it is at present.
> >
> > Benny Malengier wrote:
> >   
> >> Confidence is given globally, of the content (same
> as globally by
> >> default),
> >> and of the deduction. The SourceContentRef is
> there to hold the process of
> >> deducing information you add to eg a person as
> coming from a source. In
> >> many
> >> cases, a pure transcript of the source is done,
> and no deduction happens,
> >> in
> >> which case this object contains nothing of
> interest. If one however makes
> >> a
> >> deduction, then one can store this here
> specifically. Eg you find the name
> >> Nic__ where you cannot make out what the last hand
> written letters are,
> >> and
> >> you save the name as Nick, with reference to this
> source. Then the
> >> sourcereference can indicate why you decide to use
> Nick and not eg Nicki.
> >>
> >>     
> > As I mentioned, I have not included the fields of a
> SourceContentRef in the
> > GEPS. They could be added if there is a general desire
> to do so.
> >
> >
> > The GEPS is at
> > http://gramps-project.org/wiki/index.php?title=GEPS_023:_Storing_data_from_large_sources
> >   
>
> ------------------------------------------------------------------------------
> Lotusphere 2011
> Register now for Lotusphere 2011 and learn how
> to connect the dots, take your collaborative environment
> to the next level, and enter the era of Social Business.
> http://p.sf.net/sfu/lotusphere-d2d
> _______________________________________________
> Gramps-devel mailing list
> [hidden email]
> https://lists.sourceforge.net/lists/listinfo/gramps-devel
>


     

------------------------------------------------------------------------------
Lotusphere 2011
Register now for Lotusphere 2011 and learn how
to connect the dots, take your collaborative environment
to the next level, and enter the era of Social Business.
http://p.sf.net/sfu/lotusphere-d2d
_______________________________________________
Gramps-devel mailing list
[hidden email]
https://lists.sourceforge.net/lists/listinfo/gramps-devel
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

Re: Storing data from large sources

Tim Lyons
Administrator
jerome wrote
I just wonder if to provide the ability to link sources together (source grouping) will not match most cases ? It was the proposal scheme.

sourceref with hlink and group attributes into a source object.

http://gramps-project.org/wiki/index.php?title=Talk:GEPS_023:_Storing_data_from_large_sources

ex: Book/publication/index gives some source references. Primary source will be the Book/publication/index, child references will be the secondary sources. Large data might be stored into multiple sources, which are shared between persons, events, etc ...
If I understand your suggestion correctly, this would still leave the current SourceRef, which would contain the Volume/Page and notes, which means the problems with updating this information in many different places remains. There would also not be a place to store the Volume/Page on the child reference/secondary source. When producing reports from your suggestion, it would not be obvious where to get the primary information from and where to get the secondary information.

In contrast, with the proposal for Citations, the information in the current SourceRef is moved to the Citation, where it is shared so that it only needs to be updated in one place. When reports are produced, or when the information is output to GEDCOM, the Citation gives the detailed information and the Source gives the general information, and it is clear how to combine these to produce a complete reference text.
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

Re: Storing data from large sources

jerome
OK, I see. Thank you!

-------------------------------------

Note, I contacted actors/authors of Gedbas4all, here Jesper's answer:

"I am also very interested in a cooperation. In preparation for
Gedbas4all I took a look at Gramp's data model. On thing that is missing there -- for my private research as well -- is the possibility to create persons from different sources as different persons in Gramps and mark them as "possible the same". That way it would always be clear which information comes from a sources and which information is just an assertion of the researcher.

Our goal for Gedbas4all is a web application. It would be great if there was a desktop application with such functionality, too. A scientific analysis of genealogical information would be much easier. However, I suppose such an extension of Gramps would be too comprehensive and require changes in nearly all parts of the source code.

I took a look at the text of GEPS 23. Together with GEPS 24 a good
management of complex sources would be a great basis for an efficient capture of genealogical sources like church records, census lists etc. For Gedbas4all we have planned an arbitrary nested tree structure of sources. For an address book it might look like: book -> page -> entry. Every level can have several media objects or a clippings attached (like it is possible for sources in Gramps right now).

I would like to keep in touch to advance both projects. Or maybe we can even work together closely on specific points."
--Jesper Zedlitz



Jérôme

--- En date de : Dim 19.12.10, Tim Lyons <[hidden email]> a écrit :

> De: Tim Lyons <[hidden email]>
> Objet: Re: [Gramps-devel] Storing data from large sources
> À: [hidden email]
> Date: Dimanche 19 décembre 2010, 17h20
>
>
> jerome wrote:
> >
> > I just wonder if to provide the ability to link
> sources together (source
> > grouping) will not match most cases ? It was the
> proposal scheme.
> >
> > sourceref with hlink and group attributes into a
> source object.
> >
> > http://gramps-project.org/wiki/index.php?title=Talk:GEPS_023:_Storing_data_from_large_sources
> >
> > ex: Book/publication/index gives some source
> references. Primary source
> > will be the Book/publication/index, child references
> will be the secondary
> > sources. Large data might be stored into multiple
> sources, which are
> > shared between persons, events, etc ...
> >
>
> If I understand your suggestion correctly, this would still
> leave the
> current SourceRef, which would contain the Volume/Page and
> notes, which
> means the problems with updating this information in many
> different places
> remains. There would also not be a place to store the
> Volume/Page on the
> child reference/secondary source. When producing reports
> from your
> suggestion, it would not be obvious where to get the
> primary information
> from and where to get the secondary information.
>
> In contrast, with the proposal for Citations, the
> information in the current
> SourceRef is moved to the Citation, where it is shared so
> that it only needs
> to be updated in one place. When reports are produced, or
> when the
> information is output to GEDCOM, the Citation gives the
> detailed information
> and the Source gives the general information, and it is
> clear how to combine
> these to produce a complete reference text.
>
> --
> View this message in context: http://gramps.1791082.n4.nabble.com/Storing-data-from-large-sources-tp3063962p3094577.html
> Sent from the GRAMPS - Dev mailing list archive at
> Nabble.com.
>
> ------------------------------------------------------------------------------
> Lotusphere 2011
> Register now for Lotusphere 2011 and learn how
> to connect the dots, take your collaborative environment
> to the next level, and enter the era of Social Business.
> http://p.sf.net/sfu/lotusphere-d2d
> _______________________________________________
> Gramps-devel mailing list
> [hidden email]
> https://lists.sourceforge.net/lists/listinfo/gramps-devel
>


     

------------------------------------------------------------------------------
Lotusphere 2011
Register now for Lotusphere 2011 and learn how
to connect the dots, take your collaborative environment
to the next level, and enter the era of Social Business.
http://p.sf.net/sfu/lotusphere-d2d
_______________________________________________
Gramps-devel mailing list
[hidden email]
https://lists.sourceforge.net/lists/listinfo/gramps-devel
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

Re: Storing data from large sources

lcc .
I can imagine this being quite a simple thing to implement in Gramps.
Marking people as possibly the same? What could be hard about that?

lcc

On 12/19/10, jerome <[hidden email]> wrote:

> OK, I see. Thank you!
>
> -------------------------------------
>
> Note, I contacted actors/authors of Gedbas4all, here Jesper's answer:
>
> "I am also very interested in a cooperation. In preparation for
> Gedbas4all I took a look at Gramp's data model. On thing that is missing
> there -- for my private research as well -- is the possibility to create
> persons from different sources as different persons in Gramps and mark them
> as "possible the same". That way it would always be clear which information
> comes from a sources and which information is just an assertion of the
> researcher.
>
> Our goal for Gedbas4all is a web application. It would be great if there was
> a desktop application with such functionality, too. A scientific analysis of
> genealogical information would be much easier. However, I suppose such an
> extension of Gramps would be too comprehensive and require changes in nearly
> all parts of the source code.
>
> I took a look at the text of GEPS 23. Together with GEPS 24 a good
> management of complex sources would be a great basis for an efficient
> capture of genealogical sources like church records, census lists etc. For
> Gedbas4all we have planned an arbitrary nested tree structure of sources.
> For an address book it might look like: book -> page -> entry. Every level
> can have several media objects or a clippings attached (like it is possible
> for sources in Gramps right now).
>
> I would like to keep in touch to advance both projects. Or maybe we can even
> work together closely on specific points."
> --Jesper Zedlitz
>
>
>
> Jérôme
>
> --- En date de : Dim 19.12.10, Tim Lyons <[hidden email]> a écrit :
>
>> De: Tim Lyons <[hidden email]>
>> Objet: Re: [Gramps-devel] Storing data from large sources
>> À: [hidden email]
>> Date: Dimanche 19 décembre 2010, 17h20
>>
>>
>> jerome wrote:
>> >
>> > I just wonder if to provide the ability to link
>> sources together (source
>> > grouping) will not match most cases ? It was the
>> proposal scheme.
>> >
>> > sourceref with hlink and group attributes into a
>> source object.
>> >
>> > http://gramps-project.org/wiki/index.php?title=Talk:GEPS_023:_Storing_data_from_large_sources
>> >
>> > ex: Book/publication/index gives some source
>> references. Primary source
>> > will be the Book/publication/index, child references
>> will be the secondary
>> > sources. Large data might be stored into multiple
>> sources, which are
>> > shared between persons, events, etc ...
>> >
>>
>> If I understand your suggestion correctly, this would still
>> leave the
>> current SourceRef, which would contain the Volume/Page and
>> notes, which
>> means the problems with updating this information in many
>> different places
>> remains. There would also not be a place to store the
>> Volume/Page on the
>> child reference/secondary source. When producing reports
>> from your
>> suggestion, it would not be obvious where to get the
>> primary information
>> from and where to get the secondary information.
>>
>> In contrast, with the proposal for Citations, the
>> information in the current
>> SourceRef is moved to the Citation, where it is shared so
>> that it only needs
>> to be updated in one place. When reports are produced, or
>> when the
>> information is output to GEDCOM, the Citation gives the
>> detailed information
>> and the Source gives the general information, and it is
>> clear how to combine
>> these to produce a complete reference text.
>>
>> --
>> View this message in context:
>> http://gramps.1791082.n4.nabble.com/Storing-data-from-large-sources-tp3063962p3094577.html
>> Sent from the GRAMPS - Dev mailing list archive at
>> Nabble.com.
>>
>> ------------------------------------------------------------------------------
>> Lotusphere 2011
>> Register now for Lotusphere 2011 and learn how
>> to connect the dots, take your collaborative environment
>> to the next level, and enter the era of Social Business.
>> http://p.sf.net/sfu/lotusphere-d2d
>> _______________________________________________
>> Gramps-devel mailing list
>> [hidden email]
>> https://lists.sourceforge.net/lists/listinfo/gramps-devel
>>
>
>
>
>
> ------------------------------------------------------------------------------
> Lotusphere 2011
> Register now for Lotusphere 2011 and learn how
> to connect the dots, take your collaborative environment
> to the next level, and enter the era of Social Business.
> http://p.sf.net/sfu/lotusphere-d2d
> _______________________________________________
> Gramps-devel mailing list
> [hidden email]
> https://lists.sourceforge.net/lists/listinfo/gramps-devel
>

------------------------------------------------------------------------------
Lotusphere 2011
Register now for Lotusphere 2011 and learn how
to connect the dots, take your collaborative environment
to the next level, and enter the era of Social Business.
http://p.sf.net/sfu/lotusphere-d2d
_______________________________________________
Gramps-devel mailing list
[hidden email]
https://lists.sourceforge.net/lists/listinfo/gramps-devel
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

Re: Storing data from large sources

Benny Malengier
In reply to this post by Nick Hall-6


2010/12/19 Nick Hall <[hidden email]>
Benny,

Tim has done a lot of work on the GEPS, and it is now at a stage where I
think that it would be helpful if you could review it.
My main concern is that the Citation Reference editor may by rather
complicated and large. What do you think?

I'll try to find the time to read it this week. A long wiki page that!

Could we combine the Type and Deduction Confidence in the citation
reference? A Transcript type would imply a high confidence, whereas a
deduction would be a lower confidence. We could go down to a "Guess"
which would imply a very low confidence.

I didn't use the Source Confidence when I started to use Gramps, because
I was unsure which value to choose. The values in the GEDCOM standard
(Direct/Primary, Secondary, Questionable, Unreliable) are more obvious
how to use than the Gramps values (Very High, High, Normal, Low, Very
Low). Perhaps we could choose more descriptive values for the Deduction
Type?

I added it to the infolabel in trunk. Yes, we can rename it, but then best both of them I think:
High - Direct/Primary
....

Or does that seem as if we cannot choose?

Benny
 


Nick.



Tim Lyons wrote:
> Thanks for your suggestions Benny, I think we may be moving towards a
> consensus. I have created a GEPS to outline a change.
>
> Benny Malengier wrote:
>
>> I don't think SourceContent
>> must be presented in the interface as a core object. Instead a treeview
>> Source-SourceContent seems more natural to me.
>>
>>
> Thanks for the suggestion. A treeview for the Source View and the selector
> works well.
>
> Benny Malengier wrote:
>
>> I would do the attributes
>> different.
>>
>> Source
>>    1 Title
>>    1 Author
>>    1 Gramps ID
>>    1 Abbr
>>    1 Pulication Information
>>    1 Global Confidence
>>    n Publication Data (key value pairs, eg Publication Date, Publisher,
>> ...)
>>    n MediaRef (Region, Src, attr, notes)  --> Media
>>    n RepoRef (Type, Callnumber)           --> Repo
>>
>> SourceContent
>>     1 Source (GrampsID)
>>     1 Confidence (5 values)
>>     1 Volume
>>     1 Page
>>     1 LogDate
>>     1 Linenumber
>>     1 Position (eg. Upper Left Corner of image)
>>     n Information (key, value pairs, current Data)
>>     n NoteIds
>>     n MediaRef (Region, Src, attr, notes)  --> Media
>>
>>
> I agree except that I wouldn't remove the Notes field from the Source. This
> would be too awkward for people who are already using it, and is relevant
> where the source is not 'large'.
>
> I wonder whether we should keep Volume/Page instead of separate Volume,
> Page, Linenumber and Position for this enhancement. There is a proposal
> (GEPS 018) which would change the fields in the SourceContent according to a
> Source Type.
> http://gramps-project.org/wiki/index.php?title=GEPS_018:_Evidence_style_sources
> I wonder whether it would be better that we wait for this, rather than
> changing the fields twice. In any case, there are plenty of cases where
> breakdowns other than the proposed one are more appropriate.
>
> Benny Malengier wrote:
>
>> SourceContentRef (called Citation in the interface, part of objects with
>> sources)
>>   1 Type: Transcript or Deduction
>>   1 Deduction Confidence (5 values)
>>   1 Argumentation (one line string)
>>   n Note
>>
>>
> I have not included this in the GEPS, because it seems to relate to how
> deductions are stored, and as such may not be directly related to this
> enhancement. Also I am concerned that this may make the user experience too
> complicated. In the GEPS, users who are happy with the existing interface
> will see little change (change always frightens users); those who want more
> will be able to use the additional features.
>
> Benny Malengier wrote:
>
>> So, in this design, one must envision that Source and sourcecontent form
>> one
>> single editor.
>>
>>
> I agree, having a single editor makes things simpler for the user and
> ensures that the workflow does not get more complicated.
>
> Benny Malengier wrote:
>
>> When adding a citation to eg a person, you obtain a treeview
>> source-sourcecontent, so if you select a census entry, you immediately see
>> the data of that entry you stored.
>>
>>
> I agree - a treeview will make it no more complicated to select an existing
> Source or SourceContent than it is at present.
>
> Benny Malengier wrote:
>
>> Confidence is given globally, of the content (same as globally by
>> default),
>> and of the deduction. The SourceContentRef is there to hold the process of
>> deducing information you add to eg a person as coming from a source. In
>> many
>> cases, a pure transcript of the source is done, and no deduction happens,
>> in
>> which case this object contains nothing of interest. If one however makes
>> a
>> deduction, then one can store this here specifically. Eg you find the name
>> Nic__ where you cannot make out what the last hand written letters are,
>> and
>> you save the name as Nick, with reference to this source. Then the
>> sourcereference can indicate why you decide to use Nick and not eg Nicki.
>>
>>
> As I mentioned, I have not included the fields of a SourceContentRef in the
> GEPS. They could be added if there is a general desire to do so.
>
>
> The GEPS is at
> http://gramps-project.org/wiki/index.php?title=GEPS_023:_Storing_data_from_large_sources
>

------------------------------------------------------------------------------
Lotusphere 2011
Register now for Lotusphere 2011 and learn how
to connect the dots, take your collaborative environment
to the next level, and enter the era of Social Business.
http://p.sf.net/sfu/lotusphere-d2d
_______________________________________________
Gramps-devel mailing list
[hidden email]
https://lists.sourceforge.net/lists/listinfo/gramps-devel


------------------------------------------------------------------------------
Lotusphere 2011
Register now for Lotusphere 2011 and learn how
to connect the dots, take your collaborative environment
to the next level, and enter the era of Social Business.
http://p.sf.net/sfu/lotusphere-d2d
_______________________________________________
Gramps-devel mailing list
[hidden email]
https://lists.sourceforge.net/lists/listinfo/gramps-devel
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

Re: Storing data from large sources

Frederico Munoz
In reply to this post by Tim Lyons
Hello,

Yet again apologies for the delay in participating, this is an issue
that I've previously discussed privately with Tim as well as in the
list.

I have a different take on the problem, which doesn't mean that
changes aren't welcome - especially given the detailed GEPS than Tim
created, thanks a lot for that.

I still have to read the GEPS more times to completely understand it
but for now I have some doubts whose clarification will help in my
more complete understanding of it.

For now let me answer quickly my approach to the "problem that needs
to be solved" (http://gramps-project.org/wiki/index.php?title=GEPS_023:_Storing_data_from_large_sources).
My way of doing things is actually reflected in there, with some
objections, which I will tackle in a minute.

> I have a book that details, on page 7:
>
> “In the 1870s B moved to the town of BT. It was here that I's father K was born in 1860. By the time he was 30 he had married.
> His first child M was born there. Shortly afterwards his wife died and two years later he married G. M was 12 before her brother
> I appeared.”
>

I would create a source for the book, or a source for the book
collection (e.g. Baptism Records for a single church can have many
books, divided by year range, that cover centuries. I only create one
source).

> So I wish to record B, K, and the fact that K was born in 1860, and married around 1890. K's children were M and I, M was born
> around 1890 and I was born around 1900. [Actually, from other sources he was born on 5 Dec 1902.] I need to record page 7
> of this book as the source for all these pieces of information.

I would create a source reference for page 7 of the book, copy it to
the clipboard and use it in each event/assertion. I would further add
a specific *citation* (TEXT_FROM_SOURCE) to each different source
reference that deals only with the specific event. Example: in K's
birth event I would add "... It was here that I's father K was born in
1860...", etc. So, each Source Reference contains a different
citation, making it unique.

> Some time later I decide I should record a transcript of the source text.

I always add the transcriptions in the Source.

> Some time later, I decide to scan that page of the book, and need to store the scan as the source.

I always add the scans to the Source. Yes, I end up having to look at
the Source gallery for the specific scan that supports the Source
Reference, although this is mitigated by the use of citations somewhat

> Later still, I discover that page 212 of the same book details that I married W in 1946.

I would add a source reference to the same source, but with a
different page, in the marriage event. Would also add a citation.

> Now I wish to record W, and the marriage of W and I in 1946. The source for all this is page 212 of the book, and this time I record
> the scan against the source.

Same as above.

The objections listed are:

>    * The Source Reference does not allow the Media scan to be stored.

I agree. This was one of my original problems - although the scan
should be present in the Source, a way to link it to a source
reference would be nice.

>    * The Source Reference is not shared, there is a separate instance for each place where it occurs (e.g. each event).

I depend on this behaviour, and the GEPS makes explicit mention to the
way I do things. I would like to note that what I do is what it
already present in GEDCOM
(http://homepages.rootsweb.ancestry.com/~pmcbride/gedcom/55gcch2.htm#SOURCE_CITATION):

># Actual text from the source that was used in making assertions, for example a date phrase as actually recorded or an applicable
> sentence from a letter, would be appropriate.
># Data that allows an assessment of the relative value of one source over another for making the recorded assertions (primary or
> secondary source, etc.). Data needed for this assessment is how much time from the asserted fact and when the source event was
> recorded, what type of event was cited, and what was the role of this person in the cited event.
>
>    -Date when the entry was recorded in source document, ".SOUR.DATA.DATE."
>   -Event that initiated the recording, ".SOUR.EVEN."
>  -Role of this person in the event, ".SOUR.EVEN.ROLE".

The second paragraph is something not yet supporter in Gramps (see
http://bugs.gramps-project.org/view.php?id=2918&PHPSESSID=c5e5f69a0d0e4353852a8d5aa8ab66ad
and http://bugs.gramps-project.org/view.php?id=2924).

Having said that:

> Note that there is an argument that separate source references for the different events is preferable, because the exact text
> that relates to that particular event can be attached. For example, for the birth event for person K, one could attach: “…the
> town of BT. It was here that I's father K was born in 1860…”. There are two objections to this:
>
>    * It is difficult to identify exactly which parts of the text are relevant to each event. Should I’s father be included in the source for K’s birth?

While there will always be the need for some personal criteria (this
is far from exact an exact science), this is no different from other
decisions concerning where should source references be added. I do not
understand the objection very well though (my fault), but yes: if the
supporting information concerning K's birth is derived from that
sentence I would use "...It was here that I's father K was born in
1860...". If I knew the place that "here" is supposed to mean I would
put it inside brackets. This way I will know exactly why I have K's
birth in 1860. Without this sort of event-specific citations (read,
TEXT_FROM_SOURCE, a source note added to a source reference) I would
have to go find out by reading the entire source.

>    * It is far too tedious and laborious to devise separate source texts for each event. Given that the original paragraph giving family history
> information (this is a genuine example) is quite short, it is much quicker and easier to include the whole paragraph in each reference.

Well, sharing the Source Text note would be an option...the problem
here is that while sharing supporting citation work for one-liners not
all (and certainly not most in my experience) sources are like that.
And by making Source References something "shared" it stops being
possible to provide adequate citations that support the specific
event.

> When it comes to adding the scan, the only option I really have is to add it to the source itself, despite the fact that the scan only relates to one page.

True, but the scanned image is from the Source. What I miss is a way
to specify in a specific source reference that a certain page from the
Source is used (not transferring the scans from sources to source
references).

So, for me it is important that any improvement maintains the ability
to keep source reference specific content. Since I use citations for
everything (this is why they exist, and in PAF for example citations
have a first-order UI element that helps a lot, I have made a feature
request about this) sharing Source References would not work since
changing a citation would mean changing all of them.

A different matter is the way to "split" sources. Again using Church
Records as an example I have often felt the need for an hierarchical
classification, similar to what is used in the repositories I use:
looking at http://pesquisa.adporto.pt/cravfrontoffice/default.aspx?page=regShow&ID=488904&searchMode=as
in the right side one can see a tree. The organisation is
hierarchical, with a top category for the Parish, which contains
different "series" (Baptism, Marriages, etc), each containing an
"installation unit" (a specific book, for a specific time period).
Since sources in Gramps are "flat" this is not entirely different from
what was done with Places.

I'm sure that this GEPS will lead to a better way to do things, I'm
just presenting some initial considerations that I deem relevant.

Cheers,

Frederico

------------------------------------------------------------------------------
Learn how Oracle Real Application Clusters (RAC) One Node allows customers
to consolidate database storage, standardize their database environment, and,
should the need arise, upgrade to a full multi-node Oracle RAC database
without downtime or disruption
http://p.sf.net/sfu/oracle-sfdevnl
_______________________________________________
Gramps-devel mailing list
[hidden email]
https://lists.sourceforge.net/lists/listinfo/gramps-devel
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

Re: Storing data from large sources

Gerald Britton-2
Whew!  Your idea of a quick answer and mine are clearly not in
alignment!  Anyway, I appreciate your insight and for many things your
approach mirrors my own.  As I originally stated the problem however,
I was referring to bulk sources such as censuses or BMD registers that
contain data on thousands or even millions of people.  Plus, these
sources frequently contain interesting information that do not have a
natural home in GEDCOM or gramps, other than as textual data.

For example, Canadian censuses often record the construction material
of the house where a family lived when it was polled.  Now, I may
eventually want a Residence event; then, the construction material
might be a good event attribute.  However, I wish to capture the data
from the census *all at once* and *in one place*, including an image
of the page where the data is found.  Later, I can build other event
types from the data.  Also, I would like to have key/value pairs for
each data point, for ease of comparison with other censuses.  This
idea forms the foundation of the GEPS, I believe.

So I would have an Event (Census), with a Source (1901 Census of
Canada), with a Source Reference (RG31, Alberta, Calgary, District 35,
Sub District 1, page 2, line 3) and a matching Source Contents
containing all the data points -- as key, value pairs -- on that line
in the census (for at least one Canadian census, there are over 100
data points!) plus a Media Object reference to the image of the page
itself -- either stored locally or on the Library and Archives Canada
site (which is also the Repository for my source).

The Census gramplet does much of what I'm talking about except that it
stores the attributes as Event Reference attributes.  That has
limitations (especially sharability) and I would argue that the number
of sheep my g.grandfather had is not an attribute of the Census but
rather of my grandfather or perhaps the farm he had at the time.  It
is the search for a more general solution that prompted this thread
and the GEPS.

On Sat, Jan 1, 2011 at 1:36 PM, Frederico Muñoz <[hidden email]> wrote:

> Hello,
>
> Yet again apologies for the delay in participating, this is an issue
> that I've previously discussed privately with Tim as well as in the
> list.
>
> I have a different take on the problem, which doesn't mean that
> changes aren't welcome - especially given the detailed GEPS than Tim
> created, thanks a lot for that.
>
> I still have to read the GEPS more times to completely understand it
> but for now I have some doubts whose clarification will help in my
> more complete understanding of it.
>
> For now let me answer quickly my approach to the "problem that needs
> to be solved" (http://gramps-project.org/wiki/index.php?title=GEPS_023:_Storing_data_from_large_sources).
> My way of doing things is actually reflected in there, with some
> objections, which I will tackle in a minute.
>
>> I have a book that details, on page 7:
>>
>> “In the 1870s B moved to the town of BT. It was here that I's father K was born in 1860. By the time he was 30 he had married.
>> His first child M was born there. Shortly afterwards his wife died and two years later he married G. M was 12 before her brother
>> I appeared.”
>>
>
> I would create a source for the book, or a source for the book
> collection (e.g. Baptism Records for a single church can have many
> books, divided by year range, that cover centuries. I only create one
> source).
>
>> So I wish to record B, K, and the fact that K was born in 1860, and married around 1890. K's children were M and I, M was born
>> around 1890 and I was born around 1900. [Actually, from other sources he was born on 5 Dec 1902.] I need to record page 7
>> of this book as the source for all these pieces of information.
>
> I would create a source reference for page 7 of the book, copy it to
> the clipboard and use it in each event/assertion. I would further add
> a specific *citation* (TEXT_FROM_SOURCE) to each different source
> reference that deals only with the specific event. Example: in K's
> birth event I would add "... It was here that I's father K was born in
> 1860...", etc. So, each Source Reference contains a different
> citation, making it unique.
>
>> Some time later I decide I should record a transcript of the source text.
>
> I always add the transcriptions in the Source.
>
>> Some time later, I decide to scan that page of the book, and need to store the scan as the source.
>
> I always add the scans to the Source. Yes, I end up having to look at
> the Source gallery for the specific scan that supports the Source
> Reference, although this is mitigated by the use of citations somewhat
>
>> Later still, I discover that page 212 of the same book details that I married W in 1946.
>
> I would add a source reference to the same source, but with a
> different page, in the marriage event. Would also add a citation.
>
>> Now I wish to record W, and the marriage of W and I in 1946. The source for all this is page 212 of the book, and this time I record
>> the scan against the source.
>
> Same as above.
>
> The objections listed are:
>
>>    * The Source Reference does not allow the Media scan to be stored.
>
> I agree. This was one of my original problems - although the scan
> should be present in the Source, a way to link it to a source
> reference would be nice.
>
>>    * The Source Reference is not shared, there is a separate instance for each place where it occurs (e.g. each event).
>
> I depend on this behaviour, and the GEPS makes explicit mention to the
> way I do things. I would like to note that what I do is what it
> already present in GEDCOM
> (http://homepages.rootsweb.ancestry.com/~pmcbride/gedcom/55gcch2.htm#SOURCE_CITATION):
>
>># Actual text from the source that was used in making assertions, for example a date phrase as actually recorded or an applicable
>> sentence from a letter, would be appropriate.
>># Data that allows an assessment of the relative value of one source over another for making the recorded assertions (primary or
>> secondary source, etc.). Data needed for this assessment is how much time from the asserted fact and when the source event was
>> recorded, what type of event was cited, and what was the role of this person in the cited event.
>>
>>    -Date when the entry was recorded in source document, ".SOUR.DATA.DATE."
>>   -Event that initiated the recording, ".SOUR.EVEN."
>>  -Role of this person in the event, ".SOUR.EVEN.ROLE".
>
> The second paragraph is something not yet supporter in Gramps (see
> http://bugs.gramps-project.org/view.php?id=2918&PHPSESSID=c5e5f69a0d0e4353852a8d5aa8ab66ad
> and http://bugs.gramps-project.org/view.php?id=2924).
>
> Having said that:
>
>> Note that there is an argument that separate source references for the different events is preferable, because the exact text
>> that relates to that particular event can be attached. For example, for the birth event for person K, one could attach: “…the
>> town of BT. It was here that I's father K was born in 1860…”. There are two objections to this:
>>
>>    * It is difficult to identify exactly which parts of the text are relevant to each event. Should I’s father be included in the source for K’s birth?
>
> While there will always be the need for some personal criteria (this
> is far from exact an exact science), this is no different from other
> decisions concerning where should source references be added. I do not
> understand the objection very well though (my fault), but yes: if the
> supporting information concerning K's birth is derived from that
> sentence I would use "...It was here that I's father K was born in
> 1860...". If I knew the place that "here" is supposed to mean I would
> put it inside brackets. This way I will know exactly why I have K's
> birth in 1860. Without this sort of event-specific citations (read,
> TEXT_FROM_SOURCE, a source note added to a source reference) I would
> have to go find out by reading the entire source.
>
>>    * It is far too tedious and laborious to devise separate source texts for each event. Given that the original paragraph giving family history
>> information (this is a genuine example) is quite short, it is much quicker and easier to include the whole paragraph in each reference.
>
> Well, sharing the Source Text note would be an option...the problem
> here is that while sharing supporting citation work for one-liners not
> all (and certainly not most in my experience) sources are like that.
> And by making Source References something "shared" it stops being
> possible to provide adequate citations that support the specific
> event.
>
>> When it comes to adding the scan, the only option I really have is to add it to the source itself, despite the fact that the scan only relates to one page.
>
> True, but the scanned image is from the Source. What I miss is a way
> to specify in a specific source reference that a certain page from the
> Source is used (not transferring the scans from sources to source
> references).
>
> So, for me it is important that any improvement maintains the ability
> to keep source reference specific content. Since I use citations for
> everything (this is why they exist, and in PAF for example citations
> have a first-order UI element that helps a lot, I have made a feature
> request about this) sharing Source References would not work since
> changing a citation would mean changing all of them.
>
> A different matter is the way to "split" sources. Again using Church
> Records as an example I have often felt the need for an hierarchical
> classification, similar to what is used in the repositories I use:
> looking at http://pesquisa.adporto.pt/cravfrontoffice/default.aspx?page=regShow&ID=488904&searchMode=as
> in the right side one can see a tree. The organisation is
> hierarchical, with a top category for the Parish, which contains
> different "series" (Baptism, Marriages, etc), each containing an
> "installation unit" (a specific book, for a specific time period).
> Since sources in Gramps are "flat" this is not entirely different from
> what was done with Places.
>
> I'm sure that this GEPS will lead to a better way to do things, I'm
> just presenting some initial considerations that I deem relevant.
>
> Cheers,
>
> Frederico
>
> ------------------------------------------------------------------------------
> Learn how Oracle Real Application Clusters (RAC) One Node allows customers
> to consolidate database storage, standardize their database environment, and,
> should the need arise, upgrade to a full multi-node Oracle RAC database
> without downtime or disruption
> http://p.sf.net/sfu/oracle-sfdevnl
> _______________________________________________
> Gramps-devel mailing list
> [hidden email]
> https://lists.sourceforge.net/lists/listinfo/gramps-devel
>



--
Gerald Britton

------------------------------------------------------------------------------
Learn how Oracle Real Application Clusters (RAC) One Node allows customers
to consolidate database storage, standardize their database environment, and,
should the need arise, upgrade to a full multi-node Oracle RAC database
without downtime or disruption
http://p.sf.net/sfu/oracle-sfdevnl
_______________________________________________
Gramps-devel mailing list
[hidden email]
https://lists.sourceforge.net/lists/listinfo/gramps-devel
12
Loading...