self.db.iter_object_handles(sort_handles=True)

classic Classic list List threaded Threaded
31 messages Options
12
Reply | Threaded
Open this post in threaded view
|

self.db.iter_object_handles(sort_handles=True)

jerome
Hi,


I am trying to get an answer to a question about the code: why we cannot keep the order of objects after a Gramps XML file import against export ?

Nick pointed out that objects are not ordered on export[1].
Why ? I suppose backup scripts or revision control tools will work better with ordered objects! Anyway, to use 'sort_handles=True' works on export, except for family handles. Any reason for that ? A typo somewhere ? On my side ?


[1] http://www.gramps-project.org/bugs/view.php?id=4365

regards,
Jérôme


     

------------------------------------------------------------------------------
Protect Your Site and Customers from Malware Attacks
Learn about various malware tactics and how to avoid them. Understand
malware threats, the impact they can have on your business, and how you
can protect your company and customers by using code signing.
http://p.sf.net/sfu/oracle-sfdevnl
_______________________________________________
Gramps-devel mailing list
[hidden email]
https://lists.sourceforge.net/lists/listinfo/gramps-devel
Reply | Threaded
Open this post in threaded view
|

Re: self.db.iter_object_handles(sort_handles=True)

Gerald Britton-2
The data is not ordered since it comes from bsddb in random order.  If
we ordered it, we would have to sort it by some key.  So, if we did,
what keys would you use for:

person
family
event
source
place
repository
note
media object

On Fri, Jan 14, 2011 at 1:36 PM, jerome <[hidden email]> wrote:

> Hi,
>
>
> I am trying to get an answer to a question about the code: why we cannot keep the order of objects after a Gramps XML file import against export ?
>
> Nick pointed out that objects are not ordered on export[1].
> Why ? I suppose backup scripts or revision control tools will work better with ordered objects! Anyway, to use 'sort_handles=True' works on export, except for family handles. Any reason for that ? A typo somewhere ? On my side ?
>
>
> [1] http://www.gramps-project.org/bugs/view.php?id=4365
>
> regards,
> Jérôme
>
>
>
>
> ------------------------------------------------------------------------------
> Protect Your Site and Customers from Malware Attacks
> Learn about various malware tactics and how to avoid them. Understand
> malware threats, the impact they can have on your business, and how you
> can protect your company and customers by using code signing.
> http://p.sf.net/sfu/oracle-sfdevnl
> _______________________________________________
> Gramps-devel mailing list
> [hidden email]
> https://lists.sourceforge.net/lists/listinfo/gramps-devel
>



--
Gerald Britton

------------------------------------------------------------------------------
Protect Your Site and Customers from Malware Attacks
Learn about various malware tactics and how to avoid them. Understand
malware threats, the impact they can have on your business, and how you
can protect your company and customers by using code signing.
http://p.sf.net/sfu/oracle-sfdevnl
_______________________________________________
Gramps-devel mailing list
[hidden email]
https://lists.sourceforge.net/lists/listinfo/gramps-devel
Reply | Threaded
Open this post in threaded view
|

Re: self.db.iter_object_handles(sort_handles=True)

jerome
I am not certain to understand ...
Keys should be handles, no ?

'self.db.get_{object}_handles(sort_handles=True)' is allowed,
not 'self.db.iter_{object}_handles(sort_handles=True)'!

There is two questions:

1. Why does Gramps only use self.db.iter_family_handles(), else self.get_{object}_handles(), where {object} is person or event or source or place or repository or note or media object.

2. Why 'sort_handles=True' argument is allowed on all primary objects except family object ?

> The data is not ordered since it
> comes from bsddb in random order.

This could explain why I will not be able to keep order on XML import (to bsddb). :(


Thanks.
Jérôme

--- En date de : Ven 14.1.11, Gerald Britton <[hidden email]> a écrit :

> De: Gerald Britton <[hidden email]>
> Objet: Re: [Gramps-devel] self.db.iter_object_handles(sort_handles=True)
> À: "jerome" <[hidden email]>
> Cc: [hidden email]
> Date: Vendredi 14 janvier 2011, 19h53
> The data is not ordered since it
> comes from bsddb in random order.  If
> we ordered it, we would have to sort it by some key. 
> So, if we did,
> what keys would you use for:
>
> person
> family
> event
> source
> place
> repository
> note
> media object
>
> On Fri, Jan 14, 2011 at 1:36 PM, jerome <[hidden email]>
> wrote:
> > Hi,
> >
> >
> > I am trying to get an answer to a question about the
> code: why we cannot keep the order of objects after a Gramps
> XML file import against export ?
> >
> > Nick pointed out that objects are not ordered on
> export[1].
> > Why ? I suppose backup scripts or revision control
> tools will work better with ordered objects! Anyway, to use
> 'sort_handles=True' works on export, except for family
> handles. Any reason for that ? A typo somewhere ? On my side
> ?
> >
> >
> > [1] http://www.gramps-project.org/bugs/view.php?id=4365
> >
> > regards,
> > Jérôme
> >
> >
> >
> >
> >
> ------------------------------------------------------------------------------
> > Protect Your Site and Customers from Malware Attacks
> > Learn about various malware tactics and how to avoid
> them. Understand
> > malware threats, the impact they can have on your
> business, and how you
> > can protect your company and customers by using code
> signing.
> > http://p.sf.net/sfu/oracle-sfdevnl
> > _______________________________________________
> > Gramps-devel mailing list
> > [hidden email]
> > https://lists.sourceforge.net/lists/listinfo/gramps-devel
> >
>
>
>
> --
> Gerald Britton
>


     

------------------------------------------------------------------------------
Protect Your Site and Customers from Malware Attacks
Learn about various malware tactics and how to avoid them. Understand
malware threats, the impact they can have on your business, and how you
can protect your company and customers by using code signing.
http://p.sf.net/sfu/oracle-sfdevnl
_______________________________________________
Gramps-devel mailing list
[hidden email]
https://lists.sourceforge.net/lists/listinfo/gramps-devel
Reply | Threaded
Open this post in threaded view
|

Re: self.db.iter_object_handles(sort_handles=True)

Gerald Britton-2
On Fri, Jan 14, 2011 at 3:11 PM, jerome <[hidden email]> wrote:
> I am not certain to understand ...
> Keys should be handles, no ?

Well, that's the question!  I can see a case for gramps ids, or
surnames, or event dates, etc. etc.

>
> 'self.db.get_{object}_handles(sort_handles=True)' is allowed,
> not 'self.db.iter_{object}_handles(sort_handles=True)'!
>
> There is two questions:
>
> 1. Why does Gramps only use self.db.iter_family_handles(), else self.get_{object}_handles(), where {object} is person or event or source or place or repository or note or media object.

the get_...handles methods return a list, which can be expensive in
memory and must read all objects in one pass.  The iter... methods
just return one at at time, so are cheaper in memory.  So, the iter...
methods are preferable.  OTOH, they cannot do sorting, since by
definition you need to read all records before you can sort them.

>
> 2. Why 'sort_handles=True' argument is allowed on all primary objects except family object ?

I suppose that there has been no requirement so far so no one coded it up.

>
>> The data is not ordered since it
>> comes from bsddb in random order.
>
> This could explain why I will not be able to keep order on XML import (to bsddb). :(
>
>
> Thanks.
> Jérôme
>
> --- En date de : Ven 14.1.11, Gerald Britton <[hidden email]> a écrit :
>
>> De: Gerald Britton <[hidden email]>
>> Objet: Re: [Gramps-devel] self.db.iter_object_handles(sort_handles=True)
>> À: "jerome" <[hidden email]>
>> Cc: [hidden email]
>> Date: Vendredi 14 janvier 2011, 19h53
>> The data is not ordered since it
>> comes from bsddb in random order.  If
>> we ordered it, we would have to sort it by some key.
>> So, if we did,
>> what keys would you use for:
>>
>> person
>> family
>> event
>> source
>> place
>> repository
>> note
>> media object
>>
>> On Fri, Jan 14, 2011 at 1:36 PM, jerome <[hidden email]>
>> wrote:
>> > Hi,
>> >
>> >
>> > I am trying to get an answer to a question about the
>> code: why we cannot keep the order of objects after a Gramps
>> XML file import against export ?
>> >
>> > Nick pointed out that objects are not ordered on
>> export[1].
>> > Why ? I suppose backup scripts or revision control
>> tools will work better with ordered objects! Anyway, to use
>> 'sort_handles=True' works on export, except for family
>> handles. Any reason for that ? A typo somewhere ? On my side
>> ?
>> >
>> >
>> > [1] http://www.gramps-project.org/bugs/view.php?id=4365
>> >
>> > regards,
>> > Jérôme
>> >
>> >
>> >
>> >
>> >
>> ------------------------------------------------------------------------------
>> > Protect Your Site and Customers from Malware Attacks
>> > Learn about various malware tactics and how to avoid
>> them. Understand
>> > malware threats, the impact they can have on your
>> business, and how you
>> > can protect your company and customers by using code
>> signing.
>> > http://p.sf.net/sfu/oracle-sfdevnl
>> > _______________________________________________
>> > Gramps-devel mailing list
>> > [hidden email]
>> > https://lists.sourceforge.net/lists/listinfo/gramps-devel
>> >
>>
>>
>>
>> --
>> Gerald Britton
>>
>
>
>
>



--
Gerald Britton

------------------------------------------------------------------------------
Protect Your Site and Customers from Malware Attacks
Learn about various malware tactics and how to avoid them. Understand
malware threats, the impact they can have on your business, and how you
can protect your company and customers by using code signing.
http://p.sf.net/sfu/oracle-sfdevnl
_______________________________________________
Gramps-devel mailing list
[hidden email]
https://lists.sourceforge.net/lists/listinfo/gramps-devel
Reply | Threaded
Open this post in threaded view
|

Re: self.db.iter_object_handles(sort_handles=True)

jerome
> > I am not certain to understand ...
> > Keys should be handles, no ?
>
> Well, that's the question!  I can see a case for
> gramps ids, or
> surnames, or event dates, etc. etc.

But handle is the easiest way and safe key for ordering our data.

gramps ids could be exotic!
surnames is not a good key :(
date => date_object => year, then month, then day, then rank, etc ... = horrible index

My problem is on plugins/export/ExportXML.py

I saw a sortByID function not used, then sometimes the use of list (get_...), then iteration (only family handles).

I thought on use lists sorted by handle for having an order rule. I do not want to group handles, handles will be grouped into the Gramps XML, so it was not planned to parse one flat XML file or something like that!

But it is not my main problem ...
I thought that to sort handles means objects lists will be consistent (Persons, Families, Events, etc ...)

Every time I import a Gramps XML, Gramps rebuilds (write, DB commit) some objects! Change time is not the same with a simple import then export.

I can understand the random order used by bsddb, but this should not be done on some objects (like family) and not on the others.

In my mind, an import without DB change is like a "read-only": it is not the case. OK, you are saying that it is the way used by bsddb. XML files should be able to use 'diff' or revision control tools. With current Gramps XML import/export, these tools are limited. :(


Jérôme


--- En date de : Ven 14.1.11, Gerald Britton <[hidden email]> a écrit :

> De: Gerald Britton <[hidden email]>
> Objet: Re: [Gramps-devel] self.db.iter_object_handles(sort_handles=True)
> À: "jerome" <[hidden email]>
> Cc: [hidden email]
> Date: Vendredi 14 janvier 2011, 21h21
> On Fri, Jan 14, 2011 at 3:11 PM,
> jerome <[hidden email]>
> wrote:
> > I am not certain to understand ...
> > Keys should be handles, no ?
>
> Well, that's the question!  I can see a case for
> gramps ids, or
> surnames, or event dates, etc. etc.
>
> >
> > 'self.db.get_{object}_handles(sort_handles=True)' is
> allowed,
> > not
> 'self.db.iter_{object}_handles(sort_handles=True)'!
> >
> > There is two questions:
> >
> > 1. Why does Gramps only use
> self.db.iter_family_handles(), else
> self.get_{object}_handles(), where {object} is person or
> event or source or place or repository or note or media
> object.
>
> the get_...handles methods return a list, which can be
> expensive in
> memory and must read all objects in one pass.  The
> iter... methods
> just return one at at time, so are cheaper in memory. 
> So, the iter...
> methods are preferable.  OTOH, they cannot do sorting,
> since by
> definition you need to read all records before you can sort
> them.
>
> >
> > 2. Why 'sort_handles=True' argument is allowed on all
> primary objects except family object ?
>
> I suppose that there has been no requirement so far so no
> one coded it up.
>
> >
> >> The data is not ordered since it
> >> comes from bsddb in random order.
> >
> > This could explain why I will not be able to keep
> order on XML import (to bsddb). :(
> >
> >
> > Thanks.
> > Jérôme
> >
> > --- En date de : Ven 14.1.11, Gerald Britton <[hidden email]>
> a écrit :
> >
> >> De: Gerald Britton <[hidden email]>
> >> Objet: Re: [Gramps-devel]
> self.db.iter_object_handles(sort_handles=True)
> >> À: "jerome" <[hidden email]>
> >> Cc: [hidden email]
> >> Date: Vendredi 14 janvier 2011, 19h53
> >> The data is not ordered since it
> >> comes from bsddb in random order.  If
> >> we ordered it, we would have to sort it by some
> key.
> >> So, if we did,
> >> what keys would you use for:
> >>
> >> person
> >> family
> >> event
> >> source
> >> place
> >> repository
> >> note
> >> media object
> >>
> >> On Fri, Jan 14, 2011 at 1:36 PM, jerome <[hidden email]>
> >> wrote:
> >> > Hi,
> >> >
> >> >
> >> > I am trying to get an answer to a question
> about the
> >> code: why we cannot keep the order of objects
> after a Gramps
> >> XML file import against export ?
> >> >
> >> > Nick pointed out that objects are not ordered
> on
> >> export[1].
> >> > Why ? I suppose backup scripts or revision
> control
> >> tools will work better with ordered objects!
> Anyway, to use
> >> 'sort_handles=True' works on export, except for
> family
> >> handles. Any reason for that ? A typo somewhere ?
> On my side
> >> ?
> >> >
> >> >
> >> > [1] http://www.gramps-project.org/bugs/view.php?id=4365
> >> >
> >> > regards,
> >> > Jérôme
> >> >
> >> >
> >> >
> >> >
> >> >
> >>
> ------------------------------------------------------------------------------
> >> > Protect Your Site and Customers from Malware
> Attacks
> >> > Learn about various malware tactics and how
> to avoid
> >> them. Understand
> >> > malware threats, the impact they can have on
> your
> >> business, and how you
> >> > can protect your company and customers by
> using code
> >> signing.
> >> > http://p.sf.net/sfu/oracle-sfdevnl
> >> >
> _______________________________________________
> >> > Gramps-devel mailing list
> >> > [hidden email]
> >> > https://lists.sourceforge.net/lists/listinfo/gramps-devel
> >> >
> >>
> >>
> >>
> >> --
> >> Gerald Britton
> >>
> >
> >
> >
> >
>
>
>
> --
> Gerald Britton
>


     

------------------------------------------------------------------------------
Protect Your Site and Customers from Malware Attacks
Learn about various malware tactics and how to avoid them. Understand
malware threats, the impact they can have on your business, and how you
can protect your company and customers by using code signing.
http://p.sf.net/sfu/oracle-sfdevnl
_______________________________________________
Gramps-devel mailing list
[hidden email]
https://lists.sourceforge.net/lists/listinfo/gramps-devel
Reply | Threaded
Open this post in threaded view
|

Re: self.db.iter_object_handles(sort_handles=True)

Gerald Britton-2
On Fri, Jan 14, 2011 at 3:59 PM, jerome <[hidden email]> wrote:
>> > I am not certain to understand ...
>> > Keys should be handles, no ?
>>
>> Well, that's the question!  I can see a case for
>> gramps ids, or
>> surnames, or event dates, etc. etc.
>
> But handle is the easiest way and safe key for ordering our data.

Only if that's the order you want

>
> gramps ids could be exotic!

Do you mean unique?  Anyway it is a good sort-key candidate

> surnames is not a good key :(

I can see that some would like it...makes the XML easier to read by a human

> date => date_object => year, then month, then day, then rank, etc ... = horrible index

Probably, but its just one possibility

>
> My problem is on plugins/export/ExportXML.py
>
> I saw a sortByID function not used, then sometimes the use of list (get_...), then iteration (only family handles).
>
> I thought on use lists sorted by handle for having an order rule. I do not want to group handles, handles will be grouped into the Gramps XML, so it was not planned to parse one flat XML file or something like that!
>
> But it is not my main problem ...
> I thought that to sort handles means objects lists will be consistent (Persons, Families, Events, etc ...)
>
> Every time I import a Gramps XML, Gramps rebuilds (write, DB commit) some objects! Change time is not the same with a simple import then export.

Well, they all need new handles, right?  Possibility of collisions.
Also with gramps ids.

>
> I can understand the random order used by bsddb, but this should not be done on some objects (like family) and not on the others.
>
> In my mind, an import without DB change is like a "read-only": it is not the case. OK, you are saying that it is the way used by bsddb. XML files should be able to use 'diff' or revision control tools. With current Gramps XML import/export, these tools are limited. :(

Yep.  You're probably looking for something like a UUID for each
record.  Not a bad idea but not implemented at the moment.

>
>
> Jérôme
>
>
> --- En date de : Ven 14.1.11, Gerald Britton <[hidden email]> a écrit :
>
>> De: Gerald Britton <[hidden email]>
>> Objet: Re: [Gramps-devel] self.db.iter_object_handles(sort_handles=True)
>> À: "jerome" <[hidden email]>
>> Cc: [hidden email]
>> Date: Vendredi 14 janvier 2011, 21h21
>> On Fri, Jan 14, 2011 at 3:11 PM,
>> jerome <[hidden email]>
>> wrote:
>> > I am not certain to understand ...
>> > Keys should be handles, no ?
>>
>> Well, that's the question!  I can see a case for
>> gramps ids, or
>> surnames, or event dates, etc. etc.
>>
>> >
>> > 'self.db.get_{object}_handles(sort_handles=True)' is
>> allowed,
>> > not
>> 'self.db.iter_{object}_handles(sort_handles=True)'!
>> >
>> > There is two questions:
>> >
>> > 1. Why does Gramps only use
>> self.db.iter_family_handles(), else
>> self.get_{object}_handles(), where {object} is person or
>> event or source or place or repository or note or media
>> object.
>>
>> the get_...handles methods return a list, which can be
>> expensive in
>> memory and must read all objects in one pass.  The
>> iter... methods
>> just return one at at time, so are cheaper in memory.
>> So, the iter...
>> methods are preferable.  OTOH, they cannot do sorting,
>> since by
>> definition you need to read all records before you can sort
>> them.
>>
>> >
>> > 2. Why 'sort_handles=True' argument is allowed on all
>> primary objects except family object ?
>>
>> I suppose that there has been no requirement so far so no
>> one coded it up.
>>
>> >
>> >> The data is not ordered since it
>> >> comes from bsddb in random order.
>> >
>> > This could explain why I will not be able to keep
>> order on XML import (to bsddb). :(
>> >
>> >
>> > Thanks.
>> > Jérôme
>> >
>> > --- En date de : Ven 14.1.11, Gerald Britton <[hidden email]>
>> a écrit :
>> >
>> >> De: Gerald Britton <[hidden email]>
>> >> Objet: Re: [Gramps-devel]
>> self.db.iter_object_handles(sort_handles=True)
>> >> À: "jerome" <[hidden email]>
>> >> Cc: [hidden email]
>> >> Date: Vendredi 14 janvier 2011, 19h53
>> >> The data is not ordered since it
>> >> comes from bsddb in random order.  If
>> >> we ordered it, we would have to sort it by some
>> key.
>> >> So, if we did,
>> >> what keys would you use for:
>> >>
>> >> person
>> >> family
>> >> event
>> >> source
>> >> place
>> >> repository
>> >> note
>> >> media object
>> >>
>> >> On Fri, Jan 14, 2011 at 1:36 PM, jerome <[hidden email]>
>> >> wrote:
>> >> > Hi,
>> >> >
>> >> >
>> >> > I am trying to get an answer to a question
>> about the
>> >> code: why we cannot keep the order of objects
>> after a Gramps
>> >> XML file import against export ?
>> >> >
>> >> > Nick pointed out that objects are not ordered
>> on
>> >> export[1].
>> >> > Why ? I suppose backup scripts or revision
>> control
>> >> tools will work better with ordered objects!
>> Anyway, to use
>> >> 'sort_handles=True' works on export, except for
>> family
>> >> handles. Any reason for that ? A typo somewhere ?
>> On my side
>> >> ?
>> >> >
>> >> >
>> >> > [1] http://www.gramps-project.org/bugs/view.php?id=4365
>> >> >
>> >> > regards,
>> >> > Jérôme
>> >> >
>> >> >
>> >> >
>> >> >
>> >> >
>> >>
>> ------------------------------------------------------------------------------
>> >> > Protect Your Site and Customers from Malware
>> Attacks
>> >> > Learn about various malware tactics and how
>> to avoid
>> >> them. Understand
>> >> > malware threats, the impact they can have on
>> your
>> >> business, and how you
>> >> > can protect your company and customers by
>> using code
>> >> signing.
>> >> > http://p.sf.net/sfu/oracle-sfdevnl
>> >> >
>> _______________________________________________
>> >> > Gramps-devel mailing list
>> >> > [hidden email]
>> >> > https://lists.sourceforge.net/lists/listinfo/gramps-devel
>> >> >
>> >>
>> >>
>> >>
>> >> --
>> >> Gerald Britton
>> >>
>> >
>> >
>> >
>> >
>>
>>
>>
>> --
>> Gerald Britton
>>
>
>
>
>



--
Gerald Britton

------------------------------------------------------------------------------
Protect Your Site and Customers from Malware Attacks
Learn about various malware tactics and how to avoid them. Understand
malware threats, the impact they can have on your business, and how you
can protect your company and customers by using code signing.
http://p.sf.net/sfu/oracle-sfdevnl
_______________________________________________
Gramps-devel mailing list
[hidden email]
https://lists.sourceforge.net/lists/listinfo/gramps-devel
Reply | Threaded
Open this post in threaded view
|

Re: self.db.iter_object_handles(sort_handles=True)

jerome
> > gramps ids could be exotic!
> Do you mean unique?  Anyway it is a good sort-key
> candidate

ids = [I000001, IAYUTRE235, zharb, /empty/ , etc ...]

In 'handle' I trust! ;)

> > Every time I import a Gramps XML, Gramps rebuilds
> (write, DB commit) some objects! Change time is not the same
> with a simple import then export.
> Well, they all need new handles, right?  Possibility
> of collisions.
> Also with gramps ids.

In fact, I want to keep handles: they should be the keys control.

My problem could be illustrated by something like:

$ gramps -i import.gramps -e export.gramps
$ gunzip < import.gramps > import.xml
$ gunzip < export.gramps > export.xml
$ diff -u import.xml export.xml > diff.txt

where import.gramps is our "Scientific control".

What should be the content of diff.txt ?

For me, it should be few lines...
Unfortunatly there is some change (order, change time on family objects): that's strange!



Jérôme


--- En date de : Ven 14.1.11, Gerald Britton <[hidden email]> a écrit :

> De: Gerald Britton <[hidden email]>
> Objet: Re: [Gramps-devel] self.db.iter_object_handles(sort_handles=True)
> À: "jerome" <[hidden email]>
> Cc: [hidden email]
> Date: Vendredi 14 janvier 2011, 22h10
> On Fri, Jan 14, 2011 at 3:59 PM,
> jerome <[hidden email]>
> wrote:
> >> > I am not certain to understand ...
> >> > Keys should be handles, no ?
> >>
> >> Well, that's the question!  I can see a case for
> >> gramps ids, or
> >> surnames, or event dates, etc. etc.
> >
> > But handle is the easiest way and safe key for
> ordering our data.
>
> Only if that's the order you want
>
> >
> > gramps ids could be exotic!
>
> Do you mean unique?  Anyway it is a good sort-key
> candidate
>
> > surnames is not a good key :(
>
> I can see that some would like it...makes the XML easier to
> read by a human
>
> > date => date_object => year, then month, then
> day, then rank, etc ... = horrible index
>
> Probably, but its just one possibility
>
> >
> > My problem is on plugins/export/ExportXML.py
> >
> > I saw a sortByID function not used, then sometimes the
> use of list (get_...), then iteration (only family
> handles).
> >
> > I thought on use lists sorted by handle for having an
> order rule. I do not want to group handles, handles will be
> grouped into the Gramps XML, so it was not planned to parse
> one flat XML file or something like that!
> >
> > But it is not my main problem ...
> > I thought that to sort handles means objects lists
> will be consistent (Persons, Families, Events, etc ...)
> >
> > Every time I import a Gramps XML, Gramps rebuilds
> (write, DB commit) some objects! Change time is not the same
> with a simple import then export.
>
> Well, they all need new handles, right?  Possibility
> of collisions.
> Also with gramps ids.
>
> >
> > I can understand the random order used by bsddb, but
> this should not be done on some objects (like family) and
> not on the others.
> >
> > In my mind, an import without DB change is like a
> "read-only": it is not the case. OK, you are saying that it
> is the way used by bsddb. XML files should be able to use
> 'diff' or revision control tools. With current Gramps XML
> import/export, these tools are limited. :(
>
> Yep.  You're probably looking for something like a
> UUID for each
> record.  Not a bad idea but not implemented at the
> moment.
>
> >
> >
> > Jérôme
> >
> >
> > --- En date de : Ven 14.1.11, Gerald Britton <[hidden email]>
> a écrit :
> >
> >> De: Gerald Britton <[hidden email]>
> >> Objet: Re: [Gramps-devel]
> self.db.iter_object_handles(sort_handles=True)
> >> À: "jerome" <[hidden email]>
> >> Cc: [hidden email]
> >> Date: Vendredi 14 janvier 2011, 21h21
> >> On Fri, Jan 14, 2011 at 3:11 PM,
> >> jerome <[hidden email]>
> >> wrote:
> >> > I am not certain to understand ...
> >> > Keys should be handles, no ?
> >>
> >> Well, that's the question!  I can see a case for
> >> gramps ids, or
> >> surnames, or event dates, etc. etc.
> >>
> >> >
> >> >
> 'self.db.get_{object}_handles(sort_handles=True)' is
> >> allowed,
> >> > not
> >>
> 'self.db.iter_{object}_handles(sort_handles=True)'!
> >> >
> >> > There is two questions:
> >> >
> >> > 1. Why does Gramps only use
> >> self.db.iter_family_handles(), else
> >> self.get_{object}_handles(), where {object} is
> person or
> >> event or source or place or repository or note or
> media
> >> object.
> >>
> >> the get_...handles methods return a list, which
> can be
> >> expensive in
> >> memory and must read all objects in one pass. 
> The
> >> iter... methods
> >> just return one at at time, so are cheaper in
> memory.
> >> So, the iter...
> >> methods are preferable.  OTOH, they cannot do
> sorting,
> >> since by
> >> definition you need to read all records before you
> can sort
> >> them.
> >>
> >> >
> >> > 2. Why 'sort_handles=True' argument is
> allowed on all
> >> primary objects except family object ?
> >>
> >> I suppose that there has been no requirement so
> far so no
> >> one coded it up.
> >>
> >> >
> >> >> The data is not ordered since it
> >> >> comes from bsddb in random order.
> >> >
> >> > This could explain why I will not be able to
> keep
> >> order on XML import (to bsddb). :(
> >> >
> >> >
> >> > Thanks.
> >> > Jérôme
> >> >
> >> > --- En date de : Ven 14.1.11, Gerald Britton
> <[hidden email]>
> >> a écrit :
> >> >
> >> >> De: Gerald Britton <[hidden email]>
> >> >> Objet: Re: [Gramps-devel]
> >> self.db.iter_object_handles(sort_handles=True)
> >> >> À: "jerome" <[hidden email]>
> >> >> Cc: [hidden email]
> >> >> Date: Vendredi 14 janvier 2011, 19h53
> >> >> The data is not ordered since it
> >> >> comes from bsddb in random order.  If
> >> >> we ordered it, we would have to sort it
> by some
> >> key.
> >> >> So, if we did,
> >> >> what keys would you use for:
> >> >>
> >> >> person
> >> >> family
> >> >> event
> >> >> source
> >> >> place
> >> >> repository
> >> >> note
> >> >> media object
> >> >>
> >> >> On Fri, Jan 14, 2011 at 1:36 PM, jerome
> <[hidden email]>
> >> >> wrote:
> >> >> > Hi,
> >> >> >
> >> >> >
> >> >> > I am trying to get an answer to a
> question
> >> about the
> >> >> code: why we cannot keep the order of
> objects
> >> after a Gramps
> >> >> XML file import against export ?
> >> >> >
> >> >> > Nick pointed out that objects are
> not ordered
> >> on
> >> >> export[1].
> >> >> > Why ? I suppose backup scripts or
> revision
> >> control
> >> >> tools will work better with ordered
> objects!
> >> Anyway, to use
> >> >> 'sort_handles=True' works on export,
> except for
> >> family
> >> >> handles. Any reason for that ? A typo
> somewhere ?
> >> On my side
> >> >> ?
> >> >> >
> >> >> >
> >> >> > [1] http://www.gramps-project.org/bugs/view.php?id=4365
> >> >> >
> >> >> > regards,
> >> >> > Jérôme
> >> >> >
> >> >> >
> >> >> >
> >> >> >
> >> >> >
> >> >>
> >>
> ------------------------------------------------------------------------------
> >> >> > Protect Your Site and Customers from
> Malware
> >> Attacks
> >> >> > Learn about various malware tactics
> and how
> >> to avoid
> >> >> them. Understand
> >> >> > malware threats, the impact they can
> have on
> >> your
> >> >> business, and how you
> >> >> > can protect your company and
> customers by
> >> using code
> >> >> signing.
> >> >> > http://p.sf.net/sfu/oracle-sfdevnl
> >> >> >
> >> _______________________________________________
> >> >> > Gramps-devel mailing list
> >> >> > [hidden email]
> >> >> > https://lists.sourceforge.net/lists/listinfo/gramps-devel
> >> >> >
> >> >>
> >> >>
> >> >>
> >> >> --
> >> >> Gerald Britton
> >> >>
> >> >
> >> >
> >> >
> >> >
> >>
> >>
> >>
> >> --
> >> Gerald Britton
> >>
> >
> >
> >
> >
>
>
>
> --
> Gerald Britton
>


     

------------------------------------------------------------------------------
Protect Your Site and Customers from Malware Attacks
Learn about various malware tactics and how to avoid them. Understand
malware threats, the impact they can have on your business, and how you
can protect your company and customers by using code signing.
http://p.sf.net/sfu/oracle-sfdevnl
_______________________________________________
Gramps-devel mailing list
[hidden email]
https://lists.sourceforge.net/lists/listinfo/gramps-devel
Reply | Threaded
Open this post in threaded view
|

Re: self.db.iter_object_handles(sort_handles=True)

DS Blank
On Fri, Jan 14, 2011 at 4:31 PM, jerome <[hidden email]> wrote:

>> > gramps ids could be exotic!
>> Do you mean unique?  Anyway it is a good sort-key
>> candidate
>
> ids = [I000001, IAYUTRE235, zharb, /empty/ , etc ...]
>
> In 'handle' I trust! ;)
>
>> > Every time I import a Gramps XML, Gramps rebuilds
>> (write, DB commit) some objects! Change time is not the same
>> with a simple import then export.
>> Well, they all need new handles, right?  Possibility
>> of collisions.
>> Also with gramps ids.
>
> In fact, I want to keep handles: they should be the keys control.
>
> My problem could be illustrated by something like:
>
> $ gramps -i import.gramps -e export.gramps
> $ gunzip < import.gramps > import.xml
> $ gunzip < export.gramps > export.xml
> $ diff -u import.xml export.xml > diff.txt
>
> where import.gramps is our "Scientific control".
>
> What should be the content of diff.txt ?
>
> For me, it should be few lines...
> Unfortunatly there is some change (order, change time on family objects): that's strange!

Yes, it would be handy to do this. This might be called "idempotent"
by a mathematician: if the round-trip through gramps was idempotent,
then the diff would be empty.

What we need is:

1. something smarter than diff for this usage
2. sort on something that doesn't change (like the handle), just for
this purpose
3. make it so that the order is preserved

I would lean towards #3. I've "fixed" some other places where the
order was lost. If you let me know which orders are lost, I'll
address.

-Doug

> Jérôme
>
>
> --- En date de : Ven 14.1.11, Gerald Britton <[hidden email]> a écrit :
>
>> De: Gerald Britton <[hidden email]>
>> Objet: Re: [Gramps-devel] self.db.iter_object_handles(sort_handles=True)
>> À: "jerome" <[hidden email]>
>> Cc: [hidden email]
>> Date: Vendredi 14 janvier 2011, 22h10
>> On Fri, Jan 14, 2011 at 3:59 PM,
>> jerome <[hidden email]>
>> wrote:
>> >> > I am not certain to understand ...
>> >> > Keys should be handles, no ?
>> >>
>> >> Well, that's the question!  I can see a case for
>> >> gramps ids, or
>> >> surnames, or event dates, etc. etc.
>> >
>> > But handle is the easiest way and safe key for
>> ordering our data.
>>
>> Only if that's the order you want
>>
>> >
>> > gramps ids could be exotic!
>>
>> Do you mean unique?  Anyway it is a good sort-key
>> candidate
>>
>> > surnames is not a good key :(
>>
>> I can see that some would like it...makes the XML easier to
>> read by a human
>>
>> > date => date_object => year, then month, then
>> day, then rank, etc ... = horrible index
>>
>> Probably, but its just one possibility
>>
>> >
>> > My problem is on plugins/export/ExportXML.py
>> >
>> > I saw a sortByID function not used, then sometimes the
>> use of list (get_...), then iteration (only family
>> handles).
>> >
>> > I thought on use lists sorted by handle for having an
>> order rule. I do not want to group handles, handles will be
>> grouped into the Gramps XML, so it was not planned to parse
>> one flat XML file or something like that!
>> >
>> > But it is not my main problem ...
>> > I thought that to sort handles means objects lists
>> will be consistent (Persons, Families, Events, etc ...)
>> >
>> > Every time I import a Gramps XML, Gramps rebuilds
>> (write, DB commit) some objects! Change time is not the same
>> with a simple import then export.
>>
>> Well, they all need new handles, right?  Possibility
>> of collisions.
>> Also with gramps ids.
>>
>> >
>> > I can understand the random order used by bsddb, but
>> this should not be done on some objects (like family) and
>> not on the others.
>> >
>> > In my mind, an import without DB change is like a
>> "read-only": it is not the case. OK, you are saying that it
>> is the way used by bsddb. XML files should be able to use
>> 'diff' or revision control tools. With current Gramps XML
>> import/export, these tools are limited. :(
>>
>> Yep.  You're probably looking for something like a
>> UUID for each
>> record.  Not a bad idea but not implemented at the
>> moment.
>>
>> >
>> >
>> > Jérôme
>> >
>> >
>> > --- En date de : Ven 14.1.11, Gerald Britton <[hidden email]>
>> a écrit :
>> >
>> >> De: Gerald Britton <[hidden email]>
>> >> Objet: Re: [Gramps-devel]
>> self.db.iter_object_handles(sort_handles=True)
>> >> À: "jerome" <[hidden email]>
>> >> Cc: [hidden email]
>> >> Date: Vendredi 14 janvier 2011, 21h21
>> >> On Fri, Jan 14, 2011 at 3:11 PM,
>> >> jerome <[hidden email]>
>> >> wrote:
>> >> > I am not certain to understand ...
>> >> > Keys should be handles, no ?
>> >>
>> >> Well, that's the question!  I can see a case for
>> >> gramps ids, or
>> >> surnames, or event dates, etc. etc.
>> >>
>> >> >
>> >> >
>> 'self.db.get_{object}_handles(sort_handles=True)' is
>> >> allowed,
>> >> > not
>> >>
>> 'self.db.iter_{object}_handles(sort_handles=True)'!
>> >> >
>> >> > There is two questions:
>> >> >
>> >> > 1. Why does Gramps only use
>> >> self.db.iter_family_handles(), else
>> >> self.get_{object}_handles(), where {object} is
>> person or
>> >> event or source or place or repository or note or
>> media
>> >> object.
>> >>
>> >> the get_...handles methods return a list, which
>> can be
>> >> expensive in
>> >> memory and must read all objects in one pass.
>> The
>> >> iter... methods
>> >> just return one at at time, so are cheaper in
>> memory.
>> >> So, the iter...
>> >> methods are preferable.  OTOH, they cannot do
>> sorting,
>> >> since by
>> >> definition you need to read all records before you
>> can sort
>> >> them.
>> >>
>> >> >
>> >> > 2. Why 'sort_handles=True' argument is
>> allowed on all
>> >> primary objects except family object ?
>> >>
>> >> I suppose that there has been no requirement so
>> far so no
>> >> one coded it up.
>> >>
>> >> >
>> >> >> The data is not ordered since it
>> >> >> comes from bsddb in random order.
>> >> >
>> >> > This could explain why I will not be able to
>> keep
>> >> order on XML import (to bsddb). :(
>> >> >
>> >> >
>> >> > Thanks.
>> >> > Jérôme
>> >> >
>> >> > --- En date de : Ven 14.1.11, Gerald Britton
>> <[hidden email]>
>> >> a écrit :
>> >> >
>> >> >> De: Gerald Britton <[hidden email]>
>> >> >> Objet: Re: [Gramps-devel]
>> >> self.db.iter_object_handles(sort_handles=True)
>> >> >> À: "jerome" <[hidden email]>
>> >> >> Cc: [hidden email]
>> >> >> Date: Vendredi 14 janvier 2011, 19h53
>> >> >> The data is not ordered since it
>> >> >> comes from bsddb in random order.  If
>> >> >> we ordered it, we would have to sort it
>> by some
>> >> key.
>> >> >> So, if we did,
>> >> >> what keys would you use for:
>> >> >>
>> >> >> person
>> >> >> family
>> >> >> event
>> >> >> source
>> >> >> place
>> >> >> repository
>> >> >> note
>> >> >> media object
>> >> >>
>> >> >> On Fri, Jan 14, 2011 at 1:36 PM, jerome
>> <[hidden email]>
>> >> >> wrote:
>> >> >> > Hi,
>> >> >> >
>> >> >> >
>> >> >> > I am trying to get an answer to a
>> question
>> >> about the
>> >> >> code: why we cannot keep the order of
>> objects
>> >> after a Gramps
>> >> >> XML file import against export ?
>> >> >> >
>> >> >> > Nick pointed out that objects are
>> not ordered
>> >> on
>> >> >> export[1].
>> >> >> > Why ? I suppose backup scripts or
>> revision
>> >> control
>> >> >> tools will work better with ordered
>> objects!
>> >> Anyway, to use
>> >> >> 'sort_handles=True' works on export,
>> except for
>> >> family
>> >> >> handles. Any reason for that ? A typo
>> somewhere ?
>> >> On my side
>> >> >> ?
>> >> >> >
>> >> >> >
>> >> >> > [1] http://www.gramps-project.org/bugs/view.php?id=4365
>> >> >> >
>> >> >> > regards,
>> >> >> > Jérôme
>> >> >> >
>> >> >> >
>> >> >> >
>> >> >> >
>> >> >> >
>> >> >>
>> >>
>> ------------------------------------------------------------------------------
>> >> >> > Protect Your Site and Customers from
>> Malware
>> >> Attacks
>> >> >> > Learn about various malware tactics
>> and how
>> >> to avoid
>> >> >> them. Understand
>> >> >> > malware threats, the impact they can
>> have on
>> >> your
>> >> >> business, and how you
>> >> >> > can protect your company and
>> customers by
>> >> using code
>> >> >> signing.
>> >> >> > http://p.sf.net/sfu/oracle-sfdevnl
>> >> >> >
>> >> _______________________________________________
>> >> >> > Gramps-devel mailing list
>> >> >> > [hidden email]
>> >> >> > https://lists.sourceforge.net/lists/listinfo/gramps-devel
>> >> >> >
>> >> >>
>> >> >>
>> >> >>
>> >> >> --
>> >> >> Gerald Britton
>> >> >>
>> >> >
>> >> >
>> >> >
>> >> >
>> >>
>> >>
>> >>
>> >> --
>> >> Gerald Britton
>> >>
>> >
>> >
>> >
>> >
>>
>>
>>
>> --
>> Gerald Britton
>>
>
>
>
>
> ------------------------------------------------------------------------------
> Protect Your Site and Customers from Malware Attacks
> Learn about various malware tactics and how to avoid them. Understand
> malware threats, the impact they can have on your business, and how you
> can protect your company and customers by using code signing.
> http://p.sf.net/sfu/oracle-sfdevnl
> _______________________________________________
> Gramps-devel mailing list
> [hidden email]
> https://lists.sourceforge.net/lists/listinfo/gramps-devel
>

------------------------------------------------------------------------------
Protect Your Site and Customers from Malware Attacks
Learn about various malware tactics and how to avoid them. Understand
malware threats, the impact they can have on your business, and how you
can protect your company and customers by using code signing.
http://p.sf.net/sfu/oracle-sfdevnl
_______________________________________________
Gramps-devel mailing list
[hidden email]
https://lists.sourceforge.net/lists/listinfo/gramps-devel
Reply | Threaded
Open this post in threaded view
|

Re: self.db.iter_object_handles(sort_handles=True)

jerome
> Yes, it would be handy to do this. This might be called
> "idempotent"
> by a mathematician: if the round-trip through gramps was
> idempotent,
> then the diff would be empty.

That's exactly what I tried to do.
I learned one word! :)
Thanks!

> 3. make it so that the order is preserved
>
> I would lean towards #3. I've "fixed" some other places
> where the order was lost. If you let me know which orders are lost, I'll address.

At glance, I will say events, notes, places.

But there is something else:
1. some families are re-written (change time)
2. small samples do not reorder ! cache limit ?

http://www.gramps-project.org/bugs/view.php?id=4365


Jérôme


--- En date de : Ven 14.1.11, Doug Blank <[hidden email]> a écrit :

> De: Doug Blank <[hidden email]>
> Objet: Re: [Gramps-devel] self.db.iter_object_handles(sort_handles=True)
> À: "jerome" <[hidden email]>
> Cc: "Gerald Britton" <[hidden email]>, [hidden email]
> Date: Vendredi 14 janvier 2011, 22h57
> On Fri, Jan 14, 2011 at 4:31 PM,
> jerome <[hidden email]>
> wrote:
> >> > gramps ids could be exotic!
> >> Do you mean unique?  Anyway it is a good
> sort-key
> >> candidate
> >
> > ids = [I000001, IAYUTRE235, zharb, /empty/ , etc ...]
> >
> > In 'handle' I trust! ;)
> >
> >> > Every time I import a Gramps XML, Gramps
> rebuilds
> >> (write, DB commit) some objects! Change time is
> not the same
> >> with a simple import then export.
> >> Well, they all need new handles, right?
>  Possibility
> >> of collisions.
> >> Also with gramps ids.
> >
> > In fact, I want to keep handles: they should be the
> keys control.
> >
> > My problem could be illustrated by something like:
> >
> > $ gramps -i import.gramps -e export.gramps
> > $ gunzip < import.gramps > import.xml
> > $ gunzip < export.gramps > export.xml
> > $ diff -u import.xml export.xml > diff.txt
> >
> > where import.gramps is our "Scientific control".
> >
> > What should be the content of diff.txt ?
> >
> > For me, it should be few lines...
> > Unfortunatly there is some change (order, change time
> on family objects): that's strange!
>
> Yes, it would be handy to do this. This might be called
> "idempotent"
> by a mathematician: if the round-trip through gramps was
> idempotent,
> then the diff would be empty.
>
> What we need is:
>
> 1. something smarter than diff for this usage
> 2. sort on something that doesn't change (like the handle),
> just for
> this purpose
> 3. make it so that the order is preserved
>
> I would lean towards #3. I've "fixed" some other places
> where the
> order was lost. If you let me know which orders are lost,
> I'll
> address.
>
> -Doug
>
> > Jérôme
> >
> >
> > --- En date de : Ven 14.1.11, Gerald Britton <[hidden email]>
> a écrit :
> >
> >> De: Gerald Britton <[hidden email]>
> >> Objet: Re: [Gramps-devel]
> self.db.iter_object_handles(sort_handles=True)
> >> À: "jerome" <[hidden email]>
> >> Cc: [hidden email]
> >> Date: Vendredi 14 janvier 2011, 22h10
> >> On Fri, Jan 14, 2011 at 3:59 PM,
> >> jerome <[hidden email]>
> >> wrote:
> >> >> > I am not certain to understand ...
> >> >> > Keys should be handles, no ?
> >> >>
> >> >> Well, that's the question!  I can see a
> case for
> >> >> gramps ids, or
> >> >> surnames, or event dates, etc. etc.
> >> >
> >> > But handle is the easiest way and safe key
> for
> >> ordering our data.
> >>
> >> Only if that's the order you want
> >>
> >> >
> >> > gramps ids could be exotic!
> >>
> >> Do you mean unique?  Anyway it is a good
> sort-key
> >> candidate
> >>
> >> > surnames is not a good key :(
> >>
> >> I can see that some would like it...makes the XML
> easier to
> >> read by a human
> >>
> >> > date => date_object => year, then
> month, then
> >> day, then rank, etc ... = horrible index
> >>
> >> Probably, but its just one possibility
> >>
> >> >
> >> > My problem is on plugins/export/ExportXML.py
> >> >
> >> > I saw a sortByID function not used, then
> sometimes the
> >> use of list (get_...), then iteration (only
> family
> >> handles).
> >> >
> >> > I thought on use lists sorted by handle for
> having an
> >> order rule. I do not want to group handles,
> handles will be
> >> grouped into the Gramps XML, so it was not planned
> to parse
> >> one flat XML file or something like that!
> >> >
> >> > But it is not my main problem ...
> >> > I thought that to sort handles means objects
> lists
> >> will be consistent (Persons, Families, Events, etc
> ...)
> >> >
> >> > Every time I import a Gramps XML, Gramps
> rebuilds
> >> (write, DB commit) some objects! Change time is
> not the same
> >> with a simple import then export.
> >>
> >> Well, they all need new handles, right? 
> Possibility
> >> of collisions.
> >> Also with gramps ids.
> >>
> >> >
> >> > I can understand the random order used by
> bsddb, but
> >> this should not be done on some objects (like
> family) and
> >> not on the others.
> >> >
> >> > In my mind, an import without DB change is
> like a
> >> "read-only": it is not the case. OK, you are
> saying that it
> >> is the way used by bsddb. XML files should be able
> to use
> >> 'diff' or revision control tools. With current
> Gramps XML
> >> import/export, these tools are limited. :(
> >>
> >> Yep.  You're probably looking for something like
> a
> >> UUID for each
> >> record.  Not a bad idea but not implemented at
> the
> >> moment.
> >>
> >> >
> >> >
> >> > Jérôme
> >> >
> >> >
> >> > --- En date de : Ven 14.1.11, Gerald Britton
> <[hidden email]>
> >> a écrit :
> >> >
> >> >> De: Gerald Britton <[hidden email]>
> >> >> Objet: Re: [Gramps-devel]
> >> self.db.iter_object_handles(sort_handles=True)
> >> >> À: "jerome" <[hidden email]>
> >> >> Cc: [hidden email]
> >> >> Date: Vendredi 14 janvier 2011, 21h21
> >> >> On Fri, Jan 14, 2011 at 3:11 PM,
> >> >> jerome <[hidden email]>
> >> >> wrote:
> >> >> > I am not certain to understand ...
> >> >> > Keys should be handles, no ?
> >> >>
> >> >> Well, that's the question!  I can see a
> case for
> >> >> gramps ids, or
> >> >> surnames, or event dates, etc. etc.
> >> >>
> >> >> >
> >> >> >
> >> 'self.db.get_{object}_handles(sort_handles=True)'
> is
> >> >> allowed,
> >> >> > not
> >> >>
> >>
> 'self.db.iter_{object}_handles(sort_handles=True)'!
> >> >> >
> >> >> > There is two questions:
> >> >> >
> >> >> > 1. Why does Gramps only use
> >> >> self.db.iter_family_handles(), else
> >> >> self.get_{object}_handles(), where
> {object} is
> >> person or
> >> >> event or source or place or repository or
> note or
> >> media
> >> >> object.
> >> >>
> >> >> the get_...handles methods return a list,
> which
> >> can be
> >> >> expensive in
> >> >> memory and must read all objects in one
> pass.
> >> The
> >> >> iter... methods
> >> >> just return one at at time, so are
> cheaper in
> >> memory.
> >> >> So, the iter...
> >> >> methods are preferable.  OTOH, they
> cannot do
> >> sorting,
> >> >> since by
> >> >> definition you need to read all records
> before you
> >> can sort
> >> >> them.
> >> >>
> >> >> >
> >> >> > 2. Why 'sort_handles=True' argument
> is
> >> allowed on all
> >> >> primary objects except family object ?
> >> >>
> >> >> I suppose that there has been no
> requirement so
> >> far so no
> >> >> one coded it up.
> >> >>
> >> >> >
> >> >> >> The data is not ordered since
> it
> >> >> >> comes from bsddb in random
> order.
> >> >> >
> >> >> > This could explain why I will not be
> able to
> >> keep
> >> >> order on XML import (to bsddb). :(
> >> >> >
> >> >> >
> >> >> > Thanks.
> >> >> > Jérôme
> >> >> >
> >> >> > --- En date de : Ven 14.1.11,
> Gerald Britton
> >> <[hidden email]>
> >> >> a écrit :
> >> >> >
> >> >> >> De: Gerald Britton <[hidden email]>
> >> >> >> Objet: Re: [Gramps-devel]
> >> >>
> self.db.iter_object_handles(sort_handles=True)
> >> >> >> À: "jerome" <[hidden email]>
> >> >> >> Cc: [hidden email]
> >> >> >> Date: Vendredi 14 janvier 2011,
> 19h53
> >> >> >> The data is not ordered since
> it
> >> >> >> comes from bsddb in random
> order.  If
> >> >> >> we ordered it, we would have to
> sort it
> >> by some
> >> >> key.
> >> >> >> So, if we did,
> >> >> >> what keys would you use for:
> >> >> >>
> >> >> >> person
> >> >> >> family
> >> >> >> event
> >> >> >> source
> >> >> >> place
> >> >> >> repository
> >> >> >> note
> >> >> >> media object
> >> >> >>
> >> >> >> On Fri, Jan 14, 2011 at 1:36 PM,
> jerome
> >> <[hidden email]>
> >> >> >> wrote:
> >> >> >> > Hi,
> >> >> >> >
> >> >> >> >
> >> >> >> > I am trying to get an
> answer to a
> >> question
> >> >> about the
> >> >> >> code: why we cannot keep the
> order of
> >> objects
> >> >> after a Gramps
> >> >> >> XML file import against export
> ?
> >> >> >> >
> >> >> >> > Nick pointed out that
> objects are
> >> not ordered
> >> >> on
> >> >> >> export[1].
> >> >> >> > Why ? I suppose backup
> scripts or
> >> revision
> >> >> control
> >> >> >> tools will work better with
> ordered
> >> objects!
> >> >> Anyway, to use
> >> >> >> 'sort_handles=True' works on
> export,
> >> except for
> >> >> family
> >> >> >> handles. Any reason for that ? A
> typo
> >> somewhere ?
> >> >> On my side
> >> >> >> ?
> >> >> >> >
> >> >> >> >
> >> >> >> > [1] http://www.gramps-project.org/bugs/view.php?id=4365
> >> >> >> >
> >> >> >> > regards,
> >> >> >> > Jérôme
> >> >> >> >
> >> >> >> >
> >> >> >> >
> >> >> >> >
> >> >> >> >
> >> >> >>
> >> >>
> >>
> ------------------------------------------------------------------------------
> >> >> >> > Protect Your Site and
> Customers from
> >> Malware
> >> >> Attacks
> >> >> >> > Learn about various malware
> tactics
> >> and how
> >> >> to avoid
> >> >> >> them. Understand
> >> >> >> > malware threats, the impact
> they can
> >> have on
> >> >> your
> >> >> >> business, and how you
> >> >> >> > can protect your company
> and
> >> customers by
> >> >> using code
> >> >> >> signing.
> >> >> >> > http://p.sf.net/sfu/oracle-sfdevnl
> >> >> >> >
> >> >>
> _______________________________________________
> >> >> >> > Gramps-devel mailing list
> >> >> >> > [hidden email]
> >> >> >> > https://lists.sourceforge.net/lists/listinfo/gramps-devel
> >> >> >> >
> >> >> >>
> >> >> >>
> >> >> >>
> >> >> >> --
> >> >> >> Gerald Britton
> >> >> >>
> >> >> >
> >> >> >
> >> >> >
> >> >> >
> >> >>
> >> >>
> >> >>
> >> >> --
> >> >> Gerald Britton
> >> >>
> >> >
> >> >
> >> >
> >> >
> >>
> >>
> >>
> >> --
> >> Gerald Britton
> >>
> >
> >
> >
> >
> >
> ------------------------------------------------------------------------------
> > Protect Your Site and Customers from Malware Attacks
> > Learn about various malware tactics and how to avoid
> them. Understand
> > malware threats, the impact they can have on your
> business, and how you
> > can protect your company and customers by using code
> signing.
> > http://p.sf.net/sfu/oracle-sfdevnl
> > _______________________________________________
> > Gramps-devel mailing list
> > [hidden email]
> > https://lists.sourceforge.net/lists/listinfo/gramps-devel
> >
>


     

------------------------------------------------------------------------------
Protect Your Site and Customers from Malware Attacks
Learn about various malware tactics and how to avoid them. Understand
malware threats, the impact they can have on your business, and how you
can protect your company and customers by using code signing.
http://p.sf.net/sfu/oracle-sfdevnl
_______________________________________________
Gramps-devel mailing list
[hidden email]
https://lists.sourceforge.net/lists/listinfo/gramps-devel
Reply | Threaded
Open this post in threaded view
|

Re: self.db.iter_object_handles(sort_handles=True)

jerome
In reply to this post by DS Blank
> if the round-trip through gramps was idempotent, then the diff would be empty.

Expected result was: minor change on date generation (if generated on an
other day) and maybe media objects (media paths).

I do not expect a full idem potent after round-trip, but currently we
cannot easily get the differences. I just wanted testing complete XML
migration before major release.


Jérôme


Doug Blank a écrit :

> On Fri, Jan 14, 2011 at 4:31 PM, jerome <[hidden email]> wrote:
>>>> gramps ids could be exotic!
>>> Do you mean unique?  Anyway it is a good sort-key
>>> candidate
>> ids = [I000001, IAYUTRE235, zharb, /empty/ , etc ...]
>>
>> In 'handle' I trust! ;)
>>
>>>> Every time I import a Gramps XML, Gramps rebuilds
>>> (write, DB commit) some objects! Change time is not the same
>>> with a simple import then export.
>>> Well, they all need new handles, right?  Possibility
>>> of collisions.
>>> Also with gramps ids.
>> In fact, I want to keep handles: they should be the keys control.
>>
>> My problem could be illustrated by something like:
>>
>> $ gramps -i import.gramps -e export.gramps
>> $ gunzip < import.gramps > import.xml
>> $ gunzip < export.gramps > export.xml
>> $ diff -u import.xml export.xml > diff.txt
>>
>> where import.gramps is our "Scientific control".
>>
>> What should be the content of diff.txt ?
>>
>> For me, it should be few lines...
>> Unfortunatly there is some change (order, change time on family objects): that's strange!
>
> Yes, it would be handy to do this. This might be called "idempotent"
> by a mathematician: if the round-trip through gramps was idempotent,
> then the diff would be empty.
>
> What we need is:
>
> 1. something smarter than diff for this usage
> 2. sort on something that doesn't change (like the handle), just for
> this purpose
> 3. make it so that the order is preserved
>
> I would lean towards #3. I've "fixed" some other places where the
> order was lost. If you let me know which orders are lost, I'll
> address.
>
> -Doug
>
>> Jérôme
>>
>>
>> --- En date de : Ven 14.1.11, Gerald Britton <[hidden email]> a écrit :
>>
>>> De: Gerald Britton <[hidden email]>
>>> Objet: Re: [Gramps-devel] self.db.iter_object_handles(sort_handles=True)
>>> À: "jerome" <[hidden email]>
>>> Cc: [hidden email]
>>> Date: Vendredi 14 janvier 2011, 22h10
>>> On Fri, Jan 14, 2011 at 3:59 PM,
>>> jerome <[hidden email]>
>>> wrote:
>>>>>> I am not certain to understand ...
>>>>>> Keys should be handles, no ?
>>>>> Well, that's the question!  I can see a case for
>>>>> gramps ids, or
>>>>> surnames, or event dates, etc. etc.
>>>> But handle is the easiest way and safe key for
>>> ordering our data.
>>>
>>> Only if that's the order you want
>>>
>>>> gramps ids could be exotic!
>>> Do you mean unique?  Anyway it is a good sort-key
>>> candidate
>>>
>>>> surnames is not a good key :(
>>> I can see that some would like it...makes the XML easier to
>>> read by a human
>>>
>>>> date => date_object => year, then month, then
>>> day, then rank, etc ... = horrible index
>>>
>>> Probably, but its just one possibility
>>>
>>>> My problem is on plugins/export/ExportXML.py
>>>>
>>>> I saw a sortByID function not used, then sometimes the
>>> use of list (get_...), then iteration (only family
>>> handles).
>>>> I thought on use lists sorted by handle for having an
>>> order rule. I do not want to group handles, handles will be
>>> grouped into the Gramps XML, so it was not planned to parse
>>> one flat XML file or something like that!
>>>> But it is not my main problem ...
>>>> I thought that to sort handles means objects lists
>>> will be consistent (Persons, Families, Events, etc ...)
>>>> Every time I import a Gramps XML, Gramps rebuilds
>>> (write, DB commit) some objects! Change time is not the same
>>> with a simple import then export.
>>>
>>> Well, they all need new handles, right?  Possibility
>>> of collisions.
>>> Also with gramps ids.
>>>
>>>> I can understand the random order used by bsddb, but
>>> this should not be done on some objects (like family) and
>>> not on the others.
>>>> In my mind, an import without DB change is like a
>>> "read-only": it is not the case. OK, you are saying that it
>>> is the way used by bsddb. XML files should be able to use
>>> 'diff' or revision control tools. With current Gramps XML
>>> import/export, these tools are limited. :(
>>>
>>> Yep.  You're probably looking for something like a
>>> UUID for each
>>> record.  Not a bad idea but not implemented at the
>>> moment.
>>>
>>>>
>>>> Jérôme
>>>>
>>>>
>>>> --- En date de : Ven 14.1.11, Gerald Britton <[hidden email]>
>>> a écrit :
>>>>> De: Gerald Britton <[hidden email]>
>>>>> Objet: Re: [Gramps-devel]
>>> self.db.iter_object_handles(sort_handles=True)
>>>>> À: "jerome" <[hidden email]>
>>>>> Cc: [hidden email]
>>>>> Date: Vendredi 14 janvier 2011, 21h21
>>>>> On Fri, Jan 14, 2011 at 3:11 PM,
>>>>> jerome <[hidden email]>
>>>>> wrote:
>>>>>> I am not certain to understand ...
>>>>>> Keys should be handles, no ?
>>>>> Well, that's the question!  I can see a case for
>>>>> gramps ids, or
>>>>> surnames, or event dates, etc. etc.
>>>>>
>>>>>>
>>> 'self.db.get_{object}_handles(sort_handles=True)' is
>>>>> allowed,
>>>>>> not
>>> 'self.db.iter_{object}_handles(sort_handles=True)'!
>>>>>> There is two questions:
>>>>>>
>>>>>> 1. Why does Gramps only use
>>>>> self.db.iter_family_handles(), else
>>>>> self.get_{object}_handles(), where {object} is
>>> person or
>>>>> event or source or place or repository or note or
>>> media
>>>>> object.
>>>>>
>>>>> the get_...handles methods return a list, which
>>> can be
>>>>> expensive in
>>>>> memory and must read all objects in one pass.
>>> The
>>>>> iter... methods
>>>>> just return one at at time, so are cheaper in
>>> memory.
>>>>> So, the iter...
>>>>> methods are preferable.  OTOH, they cannot do
>>> sorting,
>>>>> since by
>>>>> definition you need to read all records before you
>>> can sort
>>>>> them.
>>>>>
>>>>>> 2. Why 'sort_handles=True' argument is
>>> allowed on all
>>>>> primary objects except family object ?
>>>>>
>>>>> I suppose that there has been no requirement so
>>> far so no
>>>>> one coded it up.
>>>>>
>>>>>>> The data is not ordered since it
>>>>>>> comes from bsddb in random order.
>>>>>> This could explain why I will not be able to
>>> keep
>>>>> order on XML import (to bsddb). :(
>>>>>>
>>>>>> Thanks.
>>>>>> Jérôme
>>>>>>
>>>>>> --- En date de : Ven 14.1.11, Gerald Britton
>>> <[hidden email]>
>>>>> a écrit :
>>>>>>> De: Gerald Britton <[hidden email]>
>>>>>>> Objet: Re: [Gramps-devel]
>>>>> self.db.iter_object_handles(sort_handles=True)
>>>>>>> À: "jerome" <[hidden email]>
>>>>>>> Cc: [hidden email]
>>>>>>> Date: Vendredi 14 janvier 2011, 19h53
>>>>>>> The data is not ordered since it
>>>>>>> comes from bsddb in random order.  If
>>>>>>> we ordered it, we would have to sort it
>>> by some
>>>>> key.
>>>>>>> So, if we did,
>>>>>>> what keys would you use for:
>>>>>>>
>>>>>>> person
>>>>>>> family
>>>>>>> event
>>>>>>> source
>>>>>>> place
>>>>>>> repository
>>>>>>> note
>>>>>>> media object
>>>>>>>
>>>>>>> On Fri, Jan 14, 2011 at 1:36 PM, jerome
>>> <[hidden email]>
>>>>>>> wrote:
>>>>>>>> Hi,
>>>>>>>>
>>>>>>>>
>>>>>>>> I am trying to get an answer to a
>>> question
>>>>> about the
>>>>>>> code: why we cannot keep the order of
>>> objects
>>>>> after a Gramps
>>>>>>> XML file import against export ?
>>>>>>>> Nick pointed out that objects are
>>> not ordered
>>>>> on
>>>>>>> export[1].
>>>>>>>> Why ? I suppose backup scripts or
>>> revision
>>>>> control
>>>>>>> tools will work better with ordered
>>> objects!
>>>>> Anyway, to use
>>>>>>> 'sort_handles=True' works on export,
>>> except for
>>>>> family
>>>>>>> handles. Any reason for that ? A typo
>>> somewhere ?
>>>>> On my side
>>>>>>> ?
>>>>>>>>
>>>>>>>> [1] http://www.gramps-project.org/bugs/view.php?id=4365
>>>>>>>>
>>>>>>>> regards,
>>>>>>>> Jérôme
>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>>
>>> ------------------------------------------------------------------------------
>>>>>>>> Protect Your Site and Customers from
>>> Malware
>>>>> Attacks
>>>>>>>> Learn about various malware tactics
>>> and how
>>>>> to avoid
>>>>>>> them. Understand
>>>>>>>> malware threats, the impact they can
>>> have on
>>>>> your
>>>>>>> business, and how you
>>>>>>>> can protect your company and
>>> customers by
>>>>> using code
>>>>>>> signing.
>>>>>>>> http://p.sf.net/sfu/oracle-sfdevnl
>>>>>>>>
>>>>> _______________________________________________
>>>>>>>> Gramps-devel mailing list
>>>>>>>> [hidden email]
>>>>>>>> https://lists.sourceforge.net/lists/listinfo/gramps-devel
>>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>> --
>>>>>>> Gerald Britton
>>>>>>>
>>>>>>
>>>>>>
>>>>>>
>>>>>
>>>>>
>>>>> --
>>>>> Gerald Britton
>>>>>
>>>>
>>>>
>>>>
>>>
>>>
>>> --
>>> Gerald Britton
>>>
>>
>>
>>
>> ------------------------------------------------------------------------------
>> Protect Your Site and Customers from Malware Attacks
>> Learn about various malware tactics and how to avoid them. Understand
>> malware threats, the impact they can have on your business, and how you
>> can protect your company and customers by using code signing.
>> http://p.sf.net/sfu/oracle-sfdevnl
>> _______________________________________________
>> Gramps-devel mailing list
>> [hidden email]
>> https://lists.sourceforge.net/lists/listinfo/gramps-devel
>>
>


------------------------------------------------------------------------------
Protect Your Site and Customers from Malware Attacks
Learn about various malware tactics and how to avoid them. Understand
malware threats, the impact they can have on your business, and how you
can protect your company and customers by using code signing.
http://p.sf.net/sfu/oracle-sfdevnl
_______________________________________________
Gramps-devel mailing list
[hidden email]
https://lists.sourceforge.net/lists/listinfo/gramps-devel
Reply | Threaded
Open this post in threaded view
|

Re: self.db.iter_object_handles(sort_handles=True)

Benny Malengier
We should _never_ order on export.
We should only access things via an index in the database.

Ordering would mean a huge time penalty on exporting for those with very large family trees.
Even exporting along a bsddb index would be much slower, as now we go from database page to database page.

Just looping over the data and exporting means the the harddisk is the least read (it goes from database page to database page).

In other words:
1/ default should be just a cursor of the database table, so order cannot be maintained
2/ ordered output could be optional. If we add an ordered output, it should be along an index page of the database, so no in memory sorting must occur before export can be done. I think ID has a sorted index over it. Handle normally also, as it is the primary key, and will hence be in some sort of B-tree. You must be sure to use the sort index on looping however.

Benny

2011/1/15 Jérôme <[hidden email]>
> if the round-trip through gramps was idempotent, then the diff would be empty.

Expected result was: minor change on date generation (if generated on an
other day) and maybe media objects (media paths).

I do not expect a full idem potent after round-trip, but currently we
cannot easily get the differences. I just wanted testing complete XML
migration before major release.


Jérôme


Doug Blank a écrit :
> On Fri, Jan 14, 2011 at 4:31 PM, jerome <[hidden email]> wrote:
>>>> gramps ids could be exotic!
>>> Do you mean unique?  Anyway it is a good sort-key
>>> candidate
>> ids = [I000001, IAYUTRE235, zharb, /empty/ , etc ...]
>>
>> In 'handle' I trust! ;)
>>
>>>> Every time I import a Gramps XML, Gramps rebuilds
>>> (write, DB commit) some objects! Change time is not the same
>>> with a simple import then export.
>>> Well, they all need new handles, right?  Possibility
>>> of collisions.
>>> Also with gramps ids.
>> In fact, I want to keep handles: they should be the keys control.
>>
>> My problem could be illustrated by something like:
>>
>> $ gramps -i import.gramps -e export.gramps
>> $ gunzip < import.gramps > import.xml
>> $ gunzip < export.gramps > export.xml
>> $ diff -u import.xml export.xml > diff.txt
>>
>> where import.gramps is our "Scientific control".
>>
>> What should be the content of diff.txt ?
>>
>> For me, it should be few lines...
>> Unfortunatly there is some change (order, change time on family objects): that's strange!
>
> Yes, it would be handy to do this. This might be called "idempotent"
> by a mathematician: if the round-trip through gramps was idempotent,
> then the diff would be empty.
>
> What we need is:
>
> 1. something smarter than diff for this usage
> 2. sort on something that doesn't change (like the handle), just for
> this purpose
> 3. make it so that the order is preserved
>
> I would lean towards #3. I've "fixed" some other places where the
> order was lost. If you let me know which orders are lost, I'll
> address.
>
> -Doug
>
>> Jérôme
>>
>>
>> --- En date de : Ven 14.1.11, Gerald Britton <[hidden email]> a écrit :
>>
>>> De: Gerald Britton <[hidden email]>
>>> Objet: Re: [Gramps-devel] self.db.iter_object_handles(sort_handles=True)
>>> À: "jerome" <[hidden email]>
>>> Cc: [hidden email]
>>> Date: Vendredi 14 janvier 2011, 22h10
>>> On Fri, Jan 14, 2011 at 3:59 PM,
>>> jerome <[hidden email]>
>>> wrote:
>>>>>> I am not certain to understand ...
>>>>>> Keys should be handles, no ?
>>>>> Well, that's the question!  I can see a case for
>>>>> gramps ids, or
>>>>> surnames, or event dates, etc. etc.
>>>> But handle is the easiest way and safe key for
>>> ordering our data.
>>>
>>> Only if that's the order you want
>>>
>>>> gramps ids could be exotic!
>>> Do you mean unique?  Anyway it is a good sort-key
>>> candidate
>>>
>>>> surnames is not a good key :(
>>> I can see that some would like it...makes the XML easier to
>>> read by a human
>>>
>>>> date => date_object => year, then month, then
>>> day, then rank, etc ... = horrible index
>>>
>>> Probably, but its just one possibility
>>>
>>>> My problem is on plugins/export/ExportXML.py
>>>>
>>>> I saw a sortByID function not used, then sometimes the
>>> use of list (get_...), then iteration (only family
>>> handles).
>>>> I thought on use lists sorted by handle for having an
>>> order rule. I do not want to group handles, handles will be
>>> grouped into the Gramps XML, so it was not planned to parse
>>> one flat XML file or something like that!
>>>> But it is not my main problem ...
>>>> I thought that to sort handles means objects lists
>>> will be consistent (Persons, Families, Events, etc ...)
>>>> Every time I import a Gramps XML, Gramps rebuilds
>>> (write, DB commit) some objects! Change time is not the same
>>> with a simple import then export.
>>>
>>> Well, they all need new handles, right?  Possibility
>>> of collisions.
>>> Also with gramps ids.
>>>
>>>> I can understand the random order used by bsddb, but
>>> this should not be done on some objects (like family) and
>>> not on the others.
>>>> In my mind, an import without DB change is like a
>>> "read-only": it is not the case. OK, you are saying that it
>>> is the way used by bsddb. XML files should be able to use
>>> 'diff' or revision control tools. With current Gramps XML
>>> import/export, these tools are limited. :(
>>>
>>> Yep.  You're probably looking for something like a
>>> UUID for each
>>> record.  Not a bad idea but not implemented at the
>>> moment.
>>>
>>>>
>>>> Jérôme
>>>>
>>>>
>>>> --- En date de : Ven 14.1.11, Gerald Britton <[hidden email]>
>>> a écrit :
>>>>> De: Gerald Britton <[hidden email]>
>>>>> Objet: Re: [Gramps-devel]
>>> self.db.iter_object_handles(sort_handles=True)
>>>>> À: "jerome" <[hidden email]>
>>>>> Cc: [hidden email]
>>>>> Date: Vendredi 14 janvier 2011, 21h21
>>>>> On Fri, Jan 14, 2011 at 3:11 PM,
>>>>> jerome <[hidden email]>
>>>>> wrote:
>>>>>> I am not certain to understand ...
>>>>>> Keys should be handles, no ?
>>>>> Well, that's the question!  I can see a case for
>>>>> gramps ids, or
>>>>> surnames, or event dates, etc. etc.
>>>>>
>>>>>>
>>> 'self.db.get_{object}_handles(sort_handles=True)' is
>>>>> allowed,
>>>>>> not
>>> 'self.db.iter_{object}_handles(sort_handles=True)'!
>>>>>> There is two questions:
>>>>>>
>>>>>> 1. Why does Gramps only use
>>>>> self.db.iter_family_handles(), else
>>>>> self.get_{object}_handles(), where {object} is
>>> person or
>>>>> event or source or place or repository or note or
>>> media
>>>>> object.
>>>>>
>>>>> the get_...handles methods return a list, which
>>> can be
>>>>> expensive in
>>>>> memory and must read all objects in one pass.
>>> The
>>>>> iter... methods
>>>>> just return one at at time, so are cheaper in
>>> memory.
>>>>> So, the iter...
>>>>> methods are preferable.  OTOH, they cannot do
>>> sorting,
>>>>> since by
>>>>> definition you need to read all records before you
>>> can sort
>>>>> them.
>>>>>
>>>>>> 2. Why 'sort_handles=True' argument is
>>> allowed on all
>>>>> primary objects except family object ?
>>>>>
>>>>> I suppose that there has been no requirement so
>>> far so no
>>>>> one coded it up.
>>>>>
>>>>>>> The data is not ordered since it
>>>>>>> comes from bsddb in random order.
>>>>>> This could explain why I will not be able to
>>> keep
>>>>> order on XML import (to bsddb). :(
>>>>>>
>>>>>> Thanks.
>>>>>> Jérôme
>>>>>>
>>>>>> --- En date de : Ven 14.1.11, Gerald Britton
>>> <[hidden email]>
>>>>> a écrit :
>>>>>>> De: Gerald Britton <[hidden email]>
>>>>>>> Objet: Re: [Gramps-devel]
>>>>> self.db.iter_object_handles(sort_handles=True)
>>>>>>> À: "jerome" <[hidden email]>
>>>>>>> Cc: [hidden email]
>>>>>>> Date: Vendredi 14 janvier 2011, 19h53
>>>>>>> The data is not ordered since it
>>>>>>> comes from bsddb in random order.  If
>>>>>>> we ordered it, we would have to sort it
>>> by some
>>>>> key.
>>>>>>> So, if we did,
>>>>>>> what keys would you use for:
>>>>>>>
>>>>>>> person
>>>>>>> family
>>>>>>> event
>>>>>>> source
>>>>>>> place
>>>>>>> repository
>>>>>>> note
>>>>>>> media object
>>>>>>>
>>>>>>> On Fri, Jan 14, 2011 at 1:36 PM, jerome
>>> <[hidden email]>
>>>>>>> wrote:
>>>>>>>> Hi,
>>>>>>>>
>>>>>>>>
>>>>>>>> I am trying to get an answer to a
>>> question
>>>>> about the
>>>>>>> code: why we cannot keep the order of
>>> objects
>>>>> after a Gramps
>>>>>>> XML file import against export ?
>>>>>>>> Nick pointed out that objects are
>>> not ordered
>>>>> on
>>>>>>> export[1].
>>>>>>>> Why ? I suppose backup scripts or
>>> revision
>>>>> control
>>>>>>> tools will work better with ordered
>>> objects!
>>>>> Anyway, to use
>>>>>>> 'sort_handles=True' works on export,
>>> except for
>>>>> family
>>>>>>> handles. Any reason for that ? A typo
>>> somewhere ?
>>>>> On my side
>>>>>>> ?
>>>>>>>>
>>>>>>>> [1] http://www.gramps-project.org/bugs/view.php?id=4365
>>>>>>>>
>>>>>>>> regards,
>>>>>>>> Jérôme
>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>>
>>> ------------------------------------------------------------------------------
>>>>>>>> Protect Your Site and Customers from
>>> Malware
>>>>> Attacks
>>>>>>>> Learn about various malware tactics
>>> and how
>>>>> to avoid
>>>>>>> them. Understand
>>>>>>>> malware threats, the impact they can
>>> have on
>>>>> your
>>>>>>> business, and how you
>>>>>>>> can protect your company and
>>> customers by
>>>>> using code
>>>>>>> signing.
>>>>>>>> http://p.sf.net/sfu/oracle-sfdevnl
>>>>>>>>
>>>>> _______________________________________________
>>>>>>>> Gramps-devel mailing list
>>>>>>>> [hidden email]
>>>>>>>> https://lists.sourceforge.net/lists/listinfo/gramps-devel
>>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>> --
>>>>>>> Gerald Britton
>>>>>>>
>>>>>>
>>>>>>
>>>>>>
>>>>>
>>>>>
>>>>> --
>>>>> Gerald Britton
>>>>>
>>>>
>>>>
>>>>
>>>
>>>
>>> --
>>> Gerald Britton
>>>
>>
>>
>>
>> ------------------------------------------------------------------------------
>> Protect Your Site and Customers from Malware Attacks
>> Learn about various malware tactics and how to avoid them. Understand
>> malware threats, the impact they can have on your business, and how you
>> can protect your company and customers by using code signing.
>> http://p.sf.net/sfu/oracle-sfdevnl
>> _______________________________________________
>> Gramps-devel mailing list
>> [hidden email]
>> https://lists.sourceforge.net/lists/listinfo/gramps-devel
>>
>


------------------------------------------------------------------------------
Protect Your Site and Customers from Malware Attacks
Learn about various malware tactics and how to avoid them. Understand
malware threats, the impact they can have on your business, and how you
can protect your company and customers by using code signing.
http://p.sf.net/sfu/oracle-sfdevnl
_______________________________________________
Gramps-devel mailing list
[hidden email]
https://lists.sourceforge.net/lists/listinfo/gramps-devel


------------------------------------------------------------------------------
Protect Your Site and Customers from Malware Attacks
Learn about various malware tactics and how to avoid them. Understand
malware threats, the impact they can have on your business, and how you
can protect your company and customers by using code signing.
http://p.sf.net/sfu/oracle-sfdevnl
_______________________________________________
Gramps-devel mailing list
[hidden email]
https://lists.sourceforge.net/lists/listinfo/gramps-devel
Reply | Threaded
Open this post in threaded view
|

Re: self.db.iter_object_handles(sort_handles=True)

Gerald Britton-2
Agreed. If the export is in handle order we should be fine.
Re-importing though can generate new handles, can it not?  If so, we
lose idempotency which is jerome's issue I think.

On 1/15/11, Benny Malengier <[hidden email]> wrote:

> We should _never_ order on export.
> We should only access things via an index in the database.
>
> Ordering would mean a huge time penalty on exporting for those with very
> large family trees.
> Even exporting along a bsddb index would be much slower, as now we go from
> database page to database page.
>
> Just looping over the data and exporting means the the harddisk is the least
> read (it goes from database page to database page).
>
> In other words:
> 1/ default should be just a cursor of the database table, so order cannot be
> maintained
> 2/ ordered output could be optional. If we add an ordered output, it should
> be along an index page of the database, so no in memory sorting must occur
> before export can be done. I think ID has a sorted index over it. Handle
> normally also, as it is the primary key, and will hence be in some sort of
> B-tree. You must be sure to use the sort index on looping however.
>
> Benny
>
> 2011/1/15 Jérôme <[hidden email]>
>
>> > if the round-trip through gramps was idempotent, then the diff would be
>> empty.
>>
>> Expected result was: minor change on date generation (if generated on an
>> other day) and maybe media objects (media paths).
>>
>> I do not expect a full idem potent after round-trip, but currently we
>> cannot easily get the differences. I just wanted testing complete XML
>> migration before major release.
>>
>>
>> Jérôme
>>
>>
>> Doug Blank a écrit :
>> > On Fri, Jan 14, 2011 at 4:31 PM, jerome <[hidden email]> wrote:
>> >>>> gramps ids could be exotic!
>> >>> Do you mean unique?  Anyway it is a good sort-key
>> >>> candidate
>> >> ids = [I000001, IAYUTRE235, zharb, /empty/ , etc ...]
>> >>
>> >> In 'handle' I trust! ;)
>> >>
>> >>>> Every time I import a Gramps XML, Gramps rebuilds
>> >>> (write, DB commit) some objects! Change time is not the same
>> >>> with a simple import then export.
>> >>> Well, they all need new handles, right?  Possibility
>> >>> of collisions.
>> >>> Also with gramps ids.
>> >> In fact, I want to keep handles: they should be the keys control.
>> >>
>> >> My problem could be illustrated by something like:
>> >>
>> >> $ gramps -i import.gramps -e export.gramps
>> >> $ gunzip < import.gramps > import.xml
>> >> $ gunzip < export.gramps > export.xml
>> >> $ diff -u import.xml export.xml > diff.txt
>> >>
>> >> where import.gramps is our "Scientific control".
>> >>
>> >> What should be the content of diff.txt ?
>> >>
>> >> For me, it should be few lines...
>> >> Unfortunatly there is some change (order, change time on family
>> objects): that's strange!
>> >
>> > Yes, it would be handy to do this. This might be called "idempotent"
>> > by a mathematician: if the round-trip through gramps was idempotent,
>> > then the diff would be empty.
>> >
>> > What we need is:
>> >
>> > 1. something smarter than diff for this usage
>> > 2. sort on something that doesn't change (like the handle), just for
>> > this purpose
>> > 3. make it so that the order is preserved
>> >
>> > I would lean towards #3. I've "fixed" some other places where the
>> > order was lost. If you let me know which orders are lost, I'll
>> > address.
>> >
>> > -Doug
>> >
>> >> Jérôme
>> >>
>> >>
>> >> --- En date de : Ven 14.1.11, Gerald Britton <[hidden email]>
>> a écrit :
>> >>
>> >>> De: Gerald Britton <[hidden email]>
>> >>> Objet: Re: [Gramps-devel]
>> self.db.iter_object_handles(sort_handles=True)
>> >>> À: "jerome" <[hidden email]>
>> >>> Cc: [hidden email]
>> >>> Date: Vendredi 14 janvier 2011, 22h10
>> >>> On Fri, Jan 14, 2011 at 3:59 PM,
>> >>> jerome <[hidden email]>
>> >>> wrote:
>> >>>>>> I am not certain to understand ...
>> >>>>>> Keys should be handles, no ?
>> >>>>> Well, that's the question!  I can see a case for
>> >>>>> gramps ids, or
>> >>>>> surnames, or event dates, etc. etc.
>> >>>> But handle is the easiest way and safe key for
>> >>> ordering our data.
>> >>>
>> >>> Only if that's the order you want
>> >>>
>> >>>> gramps ids could be exotic!
>> >>> Do you mean unique?  Anyway it is a good sort-key
>> >>> candidate
>> >>>
>> >>>> surnames is not a good key :(
>> >>> I can see that some would like it...makes the XML easier to
>> >>> read by a human
>> >>>
>> >>>> date => date_object => year, then month, then
>> >>> day, then rank, etc ... = horrible index
>> >>>
>> >>> Probably, but its just one possibility
>> >>>
>> >>>> My problem is on plugins/export/ExportXML.py
>> >>>>
>> >>>> I saw a sortByID function not used, then sometimes the
>> >>> use of list (get_...), then iteration (only family
>> >>> handles).
>> >>>> I thought on use lists sorted by handle for having an
>> >>> order rule. I do not want to group handles, handles will be
>> >>> grouped into the Gramps XML, so it was not planned to parse
>> >>> one flat XML file or something like that!
>> >>>> But it is not my main problem ...
>> >>>> I thought that to sort handles means objects lists
>> >>> will be consistent (Persons, Families, Events, etc ...)
>> >>>> Every time I import a Gramps XML, Gramps rebuilds
>> >>> (write, DB commit) some objects! Change time is not the same
>> >>> with a simple import then export.
>> >>>
>> >>> Well, they all need new handles, right?  Possibility
>> >>> of collisions.
>> >>> Also with gramps ids.
>> >>>
>> >>>> I can understand the random order used by bsddb, but
>> >>> this should not be done on some objects (like family) and
>> >>> not on the others.
>> >>>> In my mind, an import without DB change is like a
>> >>> "read-only": it is not the case. OK, you are saying that it
>> >>> is the way used by bsddb. XML files should be able to use
>> >>> 'diff' or revision control tools. With current Gramps XML
>> >>> import/export, these tools are limited. :(
>> >>>
>> >>> Yep.  You're probably looking for something like a
>> >>> UUID for each
>> >>> record.  Not a bad idea but not implemented at the
>> >>> moment.
>> >>>
>> >>>>
>> >>>> Jérôme
>> >>>>
>> >>>>
>> >>>> --- En date de : Ven 14.1.11, Gerald Britton <
>> [hidden email]>
>> >>> a écrit :
>> >>>>> De: Gerald Britton <[hidden email]>
>> >>>>> Objet: Re: [Gramps-devel]
>> >>> self.db.iter_object_handles(sort_handles=True)
>> >>>>> À: "jerome" <[hidden email]>
>> >>>>> Cc: [hidden email]
>> >>>>> Date: Vendredi 14 janvier 2011, 21h21
>> >>>>> On Fri, Jan 14, 2011 at 3:11 PM,
>> >>>>> jerome <[hidden email]>
>> >>>>> wrote:
>> >>>>>> I am not certain to understand ...
>> >>>>>> Keys should be handles, no ?
>> >>>>> Well, that's the question!  I can see a case for
>> >>>>> gramps ids, or
>> >>>>> surnames, or event dates, etc. etc.
>> >>>>>
>> >>>>>>
>> >>> 'self.db.get_{object}_handles(sort_handles=True)' is
>> >>>>> allowed,
>> >>>>>> not
>> >>> 'self.db.iter_{object}_handles(sort_handles=True)'!
>> >>>>>> There is two questions:
>> >>>>>>
>> >>>>>> 1. Why does Gramps only use
>> >>>>> self.db.iter_family_handles(), else
>> >>>>> self.get_{object}_handles(), where {object} is
>> >>> person or
>> >>>>> event or source or place or repository or note or
>> >>> media
>> >>>>> object.
>> >>>>>
>> >>>>> the get_...handles methods return a list, which
>> >>> can be
>> >>>>> expensive in
>> >>>>> memory and must read all objects in one pass.
>> >>> The
>> >>>>> iter... methods
>> >>>>> just return one at at time, so are cheaper in
>> >>> memory.
>> >>>>> So, the iter...
>> >>>>> methods are preferable.  OTOH, they cannot do
>> >>> sorting,
>> >>>>> since by
>> >>>>> definition you need to read all records before you
>> >>> can sort
>> >>>>> them.
>> >>>>>
>> >>>>>> 2. Why 'sort_handles=True' argument is
>> >>> allowed on all
>> >>>>> primary objects except family object ?
>> >>>>>
>> >>>>> I suppose that there has been no requirement so
>> >>> far so no
>> >>>>> one coded it up.
>> >>>>>
>> >>>>>>> The data is not ordered since it
>> >>>>>>> comes from bsddb in random order.
>> >>>>>> This could explain why I will not be able to
>> >>> keep
>> >>>>> order on XML import (to bsddb). :(
>> >>>>>>
>> >>>>>> Thanks.
>> >>>>>> Jérôme
>> >>>>>>
>> >>>>>> --- En date de : Ven 14.1.11, Gerald Britton
>> >>> <[hidden email]>
>> >>>>> a écrit :
>> >>>>>>> De: Gerald Britton <[hidden email]>
>> >>>>>>> Objet: Re: [Gramps-devel]
>> >>>>> self.db.iter_object_handles(sort_handles=True)
>> >>>>>>> À: "jerome" <[hidden email]>
>> >>>>>>> Cc: [hidden email]
>> >>>>>>> Date: Vendredi 14 janvier 2011, 19h53
>> >>>>>>> The data is not ordered since it
>> >>>>>>> comes from bsddb in random order.  If
>> >>>>>>> we ordered it, we would have to sort it
>> >>> by some
>> >>>>> key.
>> >>>>>>> So, if we did,
>> >>>>>>> what keys would you use for:
>> >>>>>>>
>> >>>>>>> person
>> >>>>>>> family
>> >>>>>>> event
>> >>>>>>> source
>> >>>>>>> place
>> >>>>>>> repository
>> >>>>>>> note
>> >>>>>>> media object
>> >>>>>>>
>> >>>>>>> On Fri, Jan 14, 2011 at 1:36 PM, jerome
>> >>> <[hidden email]>
>> >>>>>>> wrote:
>> >>>>>>>> Hi,
>> >>>>>>>>
>> >>>>>>>>
>> >>>>>>>> I am trying to get an answer to a
>> >>> question
>> >>>>> about the
>> >>>>>>> code: why we cannot keep the order of
>> >>> objects
>> >>>>> after a Gramps
>> >>>>>>> XML file import against export ?
>> >>>>>>>> Nick pointed out that objects are
>> >>> not ordered
>> >>>>> on
>> >>>>>>> export[1].
>> >>>>>>>> Why ? I suppose backup scripts or
>> >>> revision
>> >>>>> control
>> >>>>>>> tools will work better with ordered
>> >>> objects!
>> >>>>> Anyway, to use
>> >>>>>>> 'sort_handles=True' works on export,
>> >>> except for
>> >>>>> family
>> >>>>>>> handles. Any reason for that ? A typo
>> >>> somewhere ?
>> >>>>> On my side
>> >>>>>>> ?
>> >>>>>>>>
>> >>>>>>>> [1] http://www.gramps-project.org/bugs/view.php?id=4365
>> >>>>>>>>
>> >>>>>>>> regards,
>> >>>>>>>> Jérôme
>> >>>>>>>>
>> >>>>>>>>
>> >>>>>>>>
>> >>>>>>>>
>> >>>>>>>>
>> >>>
>> ------------------------------------------------------------------------------
>> >>>>>>>> Protect Your Site and Customers from
>> >>> Malware
>> >>>>> Attacks
>> >>>>>>>> Learn about various malware tactics
>> >>> and how
>> >>>>> to avoid
>> >>>>>>> them. Understand
>> >>>>>>>> malware threats, the impact they can
>> >>> have on
>> >>>>> your
>> >>>>>>> business, and how you
>> >>>>>>>> can protect your company and
>> >>> customers by
>> >>>>> using code
>> >>>>>>> signing.
>> >>>>>>>> http://p.sf.net/sfu/oracle-sfdevnl
>> >>>>>>>>
>> >>>>> _______________________________________________
>> >>>>>>>> Gramps-devel mailing list
>> >>>>>>>> [hidden email]
>> >>>>>>>> https://lists.sourceforge.net/lists/listinfo/gramps-devel
>> >>>>>>>>
>> >>>>>>>
>> >>>>>>>
>> >>>>>>> --
>> >>>>>>> Gerald Britton
>> >>>>>>>
>> >>>>>>
>> >>>>>>
>> >>>>>>
>> >>>>>
>> >>>>>
>> >>>>> --
>> >>>>> Gerald Britton
>> >>>>>
>> >>>>
>> >>>>
>> >>>>
>> >>>
>> >>>
>> >>> --
>> >>> Gerald Britton
>> >>>
>> >>
>> >>
>> >>
>> >>
>> ------------------------------------------------------------------------------
>> >> Protect Your Site and Customers from Malware Attacks
>> >> Learn about various malware tactics and how to avoid them. Understand
>> >> malware threats, the impact they can have on your business, and how you
>> >> can protect your company and customers by using code signing.
>> >> http://p.sf.net/sfu/oracle-sfdevnl
>> >> _______________________________________________
>> >> Gramps-devel mailing list
>> >> [hidden email]
>> >> https://lists.sourceforge.net/lists/listinfo/gramps-devel
>> >>
>> >
>>
>>
>>
>> ------------------------------------------------------------------------------
>> Protect Your Site and Customers from Malware Attacks
>> Learn about various malware tactics and how to avoid them. Understand
>> malware threats, the impact they can have on your business, and how you
>> can protect your company and customers by using code signing.
>> http://p.sf.net/sfu/oracle-sfdevnl
>> _______________________________________________
>> Gramps-devel mailing list
>> [hidden email]
>> https://lists.sourceforge.net/lists/listinfo/gramps-devel
>>
>

--
Sent from my mobile device

Gerald Britton

------------------------------------------------------------------------------
Protect Your Site and Customers from Malware Attacks
Learn about various malware tactics and how to avoid them. Understand
malware threats, the impact they can have on your business, and how you
can protect your company and customers by using code signing.
http://p.sf.net/sfu/oracle-sfdevnl
_______________________________________________
Gramps-devel mailing list
[hidden email]
https://lists.sourceforge.net/lists/listinfo/gramps-devel
Reply | Threaded
Open this post in threaded view
|

Re: self.db.iter_object_handles(sort_handles=True)

DS Blank
In reply to this post by Benny Malengier
On Sat, Jan 15, 2011 at 8:44 AM, Benny Malengier
<[hidden email]> wrote:
> We should _never_ order on export.
> We should only access things via an index in the database.

Benny,

If I understand what you mean, you mean don't sort export by something
*other* than an index. As long as we have an index to sort by, then we
are fine, right? Or did you mean something else?

-Doug

> Ordering would mean a huge time penalty on exporting for those with very
> large family trees.
> Even exporting along a bsddb index would be much slower, as now we go from
> database page to database page.
>
> Just looping over the data and exporting means the the harddisk is the least
> read (it goes from database page to database page).
>
> In other words:
> 1/ default should be just a cursor of the database table, so order cannot be
> maintained
> 2/ ordered output could be optional. If we add an ordered output, it should
> be along an index page of the database, so no in memory sorting must occur
> before export can be done. I think ID has a sorted index over it. Handle
> normally also, as it is the primary key, and will hence be in some sort of
> B-tree. You must be sure to use the sort index on looping however.
>
> Benny
>
> 2011/1/15 Jérôme <[hidden email]>
>>
>> > if the round-trip through gramps was idempotent, then the diff would be
>> > empty.
>>
>> Expected result was: minor change on date generation (if generated on an
>> other day) and maybe media objects (media paths).
>>
>> I do not expect a full idem potent after round-trip, but currently we
>> cannot easily get the differences. I just wanted testing complete XML
>> migration before major release.
>>
>>
>> Jérôme
>>
>>
>> Doug Blank a écrit :
>> > On Fri, Jan 14, 2011 at 4:31 PM, jerome <[hidden email]> wrote:
>> >>>> gramps ids could be exotic!
>> >>> Do you mean unique?  Anyway it is a good sort-key
>> >>> candidate
>> >> ids = [I000001, IAYUTRE235, zharb, /empty/ , etc ...]
>> >>
>> >> In 'handle' I trust! ;)
>> >>
>> >>>> Every time I import a Gramps XML, Gramps rebuilds
>> >>> (write, DB commit) some objects! Change time is not the same
>> >>> with a simple import then export.
>> >>> Well, they all need new handles, right?  Possibility
>> >>> of collisions.
>> >>> Also with gramps ids.
>> >> In fact, I want to keep handles: they should be the keys control.
>> >>
>> >> My problem could be illustrated by something like:
>> >>
>> >> $ gramps -i import.gramps -e export.gramps
>> >> $ gunzip < import.gramps > import.xml
>> >> $ gunzip < export.gramps > export.xml
>> >> $ diff -u import.xml export.xml > diff.txt
>> >>
>> >> where import.gramps is our "Scientific control".
>> >>
>> >> What should be the content of diff.txt ?
>> >>
>> >> For me, it should be few lines...
>> >> Unfortunatly there is some change (order, change time on family
>> >> objects): that's strange!
>> >
>> > Yes, it would be handy to do this. This might be called "idempotent"
>> > by a mathematician: if the round-trip through gramps was idempotent,
>> > then the diff would be empty.
>> >
>> > What we need is:
>> >
>> > 1. something smarter than diff for this usage
>> > 2. sort on something that doesn't change (like the handle), just for
>> > this purpose
>> > 3. make it so that the order is preserved
>> >
>> > I would lean towards #3. I've "fixed" some other places where the
>> > order was lost. If you let me know which orders are lost, I'll
>> > address.
>> >
>> > -Doug
>> >
>> >> Jérôme
>> >>
>> >>
>> >> --- En date de : Ven 14.1.11, Gerald Britton <[hidden email]>
>> >> a écrit :
>> >>
>> >>> De: Gerald Britton <[hidden email]>
>> >>> Objet: Re: [Gramps-devel]
>> >>> self.db.iter_object_handles(sort_handles=True)
>> >>> À: "jerome" <[hidden email]>
>> >>> Cc: [hidden email]
>> >>> Date: Vendredi 14 janvier 2011, 22h10
>> >>> On Fri, Jan 14, 2011 at 3:59 PM,
>> >>> jerome <[hidden email]>
>> >>> wrote:
>> >>>>>> I am not certain to understand ...
>> >>>>>> Keys should be handles, no ?
>> >>>>> Well, that's the question!  I can see a case for
>> >>>>> gramps ids, or
>> >>>>> surnames, or event dates, etc. etc.
>> >>>> But handle is the easiest way and safe key for
>> >>> ordering our data.
>> >>>
>> >>> Only if that's the order you want
>> >>>
>> >>>> gramps ids could be exotic!
>> >>> Do you mean unique?  Anyway it is a good sort-key
>> >>> candidate
>> >>>
>> >>>> surnames is not a good key :(
>> >>> I can see that some would like it...makes the XML easier to
>> >>> read by a human
>> >>>
>> >>>> date => date_object => year, then month, then
>> >>> day, then rank, etc ... = horrible index
>> >>>
>> >>> Probably, but its just one possibility
>> >>>
>> >>>> My problem is on plugins/export/ExportXML.py
>> >>>>
>> >>>> I saw a sortByID function not used, then sometimes the
>> >>> use of list (get_...), then iteration (only family
>> >>> handles).
>> >>>> I thought on use lists sorted by handle for having an
>> >>> order rule. I do not want to group handles, handles will be
>> >>> grouped into the Gramps XML, so it was not planned to parse
>> >>> one flat XML file or something like that!
>> >>>> But it is not my main problem ...
>> >>>> I thought that to sort handles means objects lists
>> >>> will be consistent (Persons, Families, Events, etc ...)
>> >>>> Every time I import a Gramps XML, Gramps rebuilds
>> >>> (write, DB commit) some objects! Change time is not the same
>> >>> with a simple import then export.
>> >>>
>> >>> Well, they all need new handles, right?  Possibility
>> >>> of collisions.
>> >>> Also with gramps ids.
>> >>>
>> >>>> I can understand the random order used by bsddb, but
>> >>> this should not be done on some objects (like family) and
>> >>> not on the others.
>> >>>> In my mind, an import without DB change is like a
>> >>> "read-only": it is not the case. OK, you are saying that it
>> >>> is the way used by bsddb. XML files should be able to use
>> >>> 'diff' or revision control tools. With current Gramps XML
>> >>> import/export, these tools are limited. :(
>> >>>
>> >>> Yep.  You're probably looking for something like a
>> >>> UUID for each
>> >>> record.  Not a bad idea but not implemented at the
>> >>> moment.
>> >>>
>> >>>>
>> >>>> Jérôme
>> >>>>
>> >>>>
>> >>>> --- En date de : Ven 14.1.11, Gerald Britton
>> >>>> <[hidden email]>
>> >>> a écrit :
>> >>>>> De: Gerald Britton <[hidden email]>
>> >>>>> Objet: Re: [Gramps-devel]
>> >>> self.db.iter_object_handles(sort_handles=True)
>> >>>>> À: "jerome" <[hidden email]>
>> >>>>> Cc: [hidden email]
>> >>>>> Date: Vendredi 14 janvier 2011, 21h21
>> >>>>> On Fri, Jan 14, 2011 at 3:11 PM,
>> >>>>> jerome <[hidden email]>
>> >>>>> wrote:
>> >>>>>> I am not certain to understand ...
>> >>>>>> Keys should be handles, no ?
>> >>>>> Well, that's the question!  I can see a case for
>> >>>>> gramps ids, or
>> >>>>> surnames, or event dates, etc. etc.
>> >>>>>
>> >>>>>>
>> >>> 'self.db.get_{object}_handles(sort_handles=True)' is
>> >>>>> allowed,
>> >>>>>> not
>> >>> 'self.db.iter_{object}_handles(sort_handles=True)'!
>> >>>>>> There is two questions:
>> >>>>>>
>> >>>>>> 1. Why does Gramps only use
>> >>>>> self.db.iter_family_handles(), else
>> >>>>> self.get_{object}_handles(), where {object} is
>> >>> person or
>> >>>>> event or source or place or repository or note or
>> >>> media
>> >>>>> object.
>> >>>>>
>> >>>>> the get_...handles methods return a list, which
>> >>> can be
>> >>>>> expensive in
>> >>>>> memory and must read all objects in one pass.
>> >>> The
>> >>>>> iter... methods
>> >>>>> just return one at at time, so are cheaper in
>> >>> memory.
>> >>>>> So, the iter...
>> >>>>> methods are preferable.  OTOH, they cannot do
>> >>> sorting,
>> >>>>> since by
>> >>>>> definition you need to read all records before you
>> >>> can sort
>> >>>>> them.
>> >>>>>
>> >>>>>> 2. Why 'sort_handles=True' argument is
>> >>> allowed on all
>> >>>>> primary objects except family object ?
>> >>>>>
>> >>>>> I suppose that there has been no requirement so
>> >>> far so no
>> >>>>> one coded it up.
>> >>>>>
>> >>>>>>> The data is not ordered since it
>> >>>>>>> comes from bsddb in random order.
>> >>>>>> This could explain why I will not be able to
>> >>> keep
>> >>>>> order on XML import (to bsddb). :(
>> >>>>>>
>> >>>>>> Thanks.
>> >>>>>> Jérôme
>> >>>>>>
>> >>>>>> --- En date de : Ven 14.1.11, Gerald Britton
>> >>> <[hidden email]>
>> >>>>> a écrit :
>> >>>>>>> De: Gerald Britton <[hidden email]>
>> >>>>>>> Objet: Re: [Gramps-devel]
>> >>>>> self.db.iter_object_handles(sort_handles=True)
>> >>>>>>> À: "jerome" <[hidden email]>
>> >>>>>>> Cc: [hidden email]
>> >>>>>>> Date: Vendredi 14 janvier 2011, 19h53
>> >>>>>>> The data is not ordered since it
>> >>>>>>> comes from bsddb in random order.  If
>> >>>>>>> we ordered it, we would have to sort it
>> >>> by some
>> >>>>> key.
>> >>>>>>> So, if we did,
>> >>>>>>> what keys would you use for:
>> >>>>>>>
>> >>>>>>> person
>> >>>>>>> family
>> >>>>>>> event
>> >>>>>>> source
>> >>>>>>> place
>> >>>>>>> repository
>> >>>>>>> note
>> >>>>>>> media object
>> >>>>>>>
>> >>>>>>> On Fri, Jan 14, 2011 at 1:36 PM, jerome
>> >>> <[hidden email]>
>> >>>>>>> wrote:
>> >>>>>>>> Hi,
>> >>>>>>>>
>> >>>>>>>>
>> >>>>>>>> I am trying to get an answer to a
>> >>> question
>> >>>>> about the
>> >>>>>>> code: why we cannot keep the order of
>> >>> objects
>> >>>>> after a Gramps
>> >>>>>>> XML file import against export ?
>> >>>>>>>> Nick pointed out that objects are
>> >>> not ordered
>> >>>>> on
>> >>>>>>> export[1].
>> >>>>>>>> Why ? I suppose backup scripts or
>> >>> revision
>> >>>>> control
>> >>>>>>> tools will work better with ordered
>> >>> objects!
>> >>>>> Anyway, to use
>> >>>>>>> 'sort_handles=True' works on export,
>> >>> except for
>> >>>>> family
>> >>>>>>> handles. Any reason for that ? A typo
>> >>> somewhere ?
>> >>>>> On my side
>> >>>>>>> ?
>> >>>>>>>>
>> >>>>>>>> [1] http://www.gramps-project.org/bugs/view.php?id=4365
>> >>>>>>>>
>> >>>>>>>> regards,
>> >>>>>>>> Jérôme
>> >>>>>>>>
>> >>>>>>>>
>> >>>>>>>>
>> >>>>>>>>
>> >>>>>>>>
>> >>>
>> >>> ------------------------------------------------------------------------------
>> >>>>>>>> Protect Your Site and Customers from
>> >>> Malware
>> >>>>> Attacks
>> >>>>>>>> Learn about various malware tactics
>> >>> and how
>> >>>>> to avoid
>> >>>>>>> them. Understand
>> >>>>>>>> malware threats, the impact they can
>> >>> have on
>> >>>>> your
>> >>>>>>> business, and how you
>> >>>>>>>> can protect your company and
>> >>> customers by
>> >>>>> using code
>> >>>>>>> signing.
>> >>>>>>>> http://p.sf.net/sfu/oracle-sfdevnl
>> >>>>>>>>
>> >>>>> _______________________________________________
>> >>>>>>>> Gramps-devel mailing list
>> >>>>>>>> [hidden email]
>> >>>>>>>> https://lists.sourceforge.net/lists/listinfo/gramps-devel
>> >>>>>>>>
>> >>>>>>>
>> >>>>>>>
>> >>>>>>> --
>> >>>>>>> Gerald Britton
>> >>>>>>>
>> >>>>>>
>> >>>>>>
>> >>>>>>
>> >>>>>
>> >>>>>
>> >>>>> --
>> >>>>> Gerald Britton
>> >>>>>
>> >>>>
>> >>>>
>> >>>>
>> >>>
>> >>>
>> >>> --
>> >>> Gerald Britton
>> >>>
>> >>
>> >>
>> >>
>> >>
>> >> ------------------------------------------------------------------------------
>> >> Protect Your Site and Customers from Malware Attacks
>> >> Learn about various malware tactics and how to avoid them. Understand
>> >> malware threats, the impact they can have on your business, and how you
>> >> can protect your company and customers by using code signing.
>> >> http://p.sf.net/sfu/oracle-sfdevnl
>> >> _______________________________________________
>> >> Gramps-devel mailing list
>> >> [hidden email]
>> >> https://lists.sourceforge.net/lists/listinfo/gramps-devel
>> >>
>> >
>>
>>
>>
>> ------------------------------------------------------------------------------
>> Protect Your Site and Customers from Malware Attacks
>> Learn about various malware tactics and how to avoid them. Understand
>> malware threats, the impact they can have on your business, and how you
>> can protect your company and customers by using code signing.
>> http://p.sf.net/sfu/oracle-sfdevnl
>> _______________________________________________
>> Gramps-devel mailing list
>> [hidden email]
>> https://lists.sourceforge.net/lists/listinfo/gramps-devel
>
>

------------------------------------------------------------------------------
Protect Your Site and Customers from Malware Attacks
Learn about various malware tactics and how to avoid them. Understand
malware threats, the impact they can have on your business, and how you
can protect your company and customers by using code signing.
http://p.sf.net/sfu/oracle-sfdevnl
_______________________________________________
Gramps-devel mailing list
[hidden email]
https://lists.sourceforge.net/lists/listinfo/gramps-devel
Reply | Threaded
Open this post in threaded view
|

Re: self.db.iter_object_handles(sort_handles=True)

Benny Malengier
In reply to this post by Gerald Britton-2


2011/1/15 Gerald Britton <[hidden email]>
Agreed. If the export is in handle order we should be fine.
Re-importing though can generate new handles, can it not?  If so, we
lose idempotency which is jerome's issue I think.

Reimporting into an empty family tree keeps the handles.

Benny

On 1/15/11, Benny Malengier <[hidden email]> wrote:
> We should _never_ order on export.
> We should only access things via an index in the database.
>
> Ordering would mean a huge time penalty on exporting for those with very
> large family trees.
> Even exporting along a bsddb index would be much slower, as now we go from
> database page to database page.
>
> Just looping over the data and exporting means the the harddisk is the least
> read (it goes from database page to database page).
>
> In other words:
> 1/ default should be just a cursor of the database table, so order cannot be
> maintained
> 2/ ordered output could be optional. If we add an ordered output, it should
> be along an index page of the database, so no in memory sorting must occur
> before export can be done. I think ID has a sorted index over it. Handle
> normally also, as it is the primary key, and will hence be in some sort of
> B-tree. You must be sure to use the sort index on looping however.
>
> Benny
>
> 2011/1/15 Jérôme <[hidden email]>
>
>> > if the round-trip through gramps was idempotent, then the diff would be
>> empty.
>>
>> Expected result was: minor change on date generation (if generated on an
>> other day) and maybe media objects (media paths).
>>
>> I do not expect a full idem potent after round-trip, but currently we
>> cannot easily get the differences. I just wanted testing complete XML
>> migration before major release.
>>
>>
>> Jérôme
>>
>>
>> Doug Blank a écrit :
>> > On Fri, Jan 14, 2011 at 4:31 PM, jerome <[hidden email]> wrote:
>> >>>> gramps ids could be exotic!
>> >>> Do you mean unique?  Anyway it is a good sort-key
>> >>> candidate
>> >> ids = [I000001, IAYUTRE235, zharb, /empty/ , etc ...]
>> >>
>> >> In 'handle' I trust! ;)
>> >>
>> >>>> Every time I import a Gramps XML, Gramps rebuilds
>> >>> (write, DB commit) some objects! Change time is not the same
>> >>> with a simple import then export.
>> >>> Well, they all need new handles, right?  Possibility
>> >>> of collisions.
>> >>> Also with gramps ids.
>> >> In fact, I want to keep handles: they should be the keys control.
>> >>
>> >> My problem could be illustrated by something like:
>> >>
>> >> $ gramps -i import.gramps -e export.gramps
>> >> $ gunzip < import.gramps > import.xml
>> >> $ gunzip < export.gramps > export.xml
>> >> $ diff -u import.xml export.xml > diff.txt
>> >>
>> >> where import.gramps is our "Scientific control".
>> >>
>> >> What should be the content of diff.txt ?
>> >>
>> >> For me, it should be few lines...
>> >> Unfortunatly there is some change (order, change time on family
>> objects): that's strange!
>> >
>> > Yes, it would be handy to do this. This might be called "idempotent"
>> > by a mathematician: if the round-trip through gramps was idempotent,
>> > then the diff would be empty.
>> >
>> > What we need is:
>> >
>> > 1. something smarter than diff for this usage
>> > 2. sort on something that doesn't change (like the handle), just for
>> > this purpose
>> > 3. make it so that the order is preserved
>> >
>> > I would lean towards #3. I've "fixed" some other places where the
>> > order was lost. If you let me know which orders are lost, I'll
>> > address.
>> >
>> > -Doug
>> >
>> >> Jérôme
>> >>
>> >>
>> >> --- En date de : Ven 14.1.11, Gerald Britton <[hidden email]>
>> a écrit :
>> >>
>> >>> De: Gerald Britton <[hidden email]>
>> >>> Objet: Re: [Gramps-devel]
>> self.db.iter_object_handles(sort_handles=True)
>> >>> À: "jerome" <[hidden email]>
>> >>> Cc: [hidden email]
>> >>> Date: Vendredi 14 janvier 2011, 22h10
>> >>> On Fri, Jan 14, 2011 at 3:59 PM,
>> >>> jerome <[hidden email]>
>> >>> wrote:
>> >>>>>> I am not certain to understand ...
>> >>>>>> Keys should be handles, no ?
>> >>>>> Well, that's the question!  I can see a case for
>> >>>>> gramps ids, or
>> >>>>> surnames, or event dates, etc. etc.
>> >>>> But handle is the easiest way and safe key for
>> >>> ordering our data.
>> >>>
>> >>> Only if that's the order you want
>> >>>
>> >>>> gramps ids could be exotic!
>> >>> Do you mean unique?  Anyway it is a good sort-key
>> >>> candidate
>> >>>
>> >>>> surnames is not a good key :(
>> >>> I can see that some would like it...makes the XML easier to
>> >>> read by a human
>> >>>
>> >>>> date => date_object => year, then month, then
>> >>> day, then rank, etc ... = horrible index
>> >>>
>> >>> Probably, but its just one possibility
>> >>>
>> >>>> My problem is on plugins/export/ExportXML.py
>> >>>>
>> >>>> I saw a sortByID function not used, then sometimes the
>> >>> use of list (get_...), then iteration (only family
>> >>> handles).
>> >>>> I thought on use lists sorted by handle for having an
>> >>> order rule. I do not want to group handles, handles will be
>> >>> grouped into the Gramps XML, so it was not planned to parse
>> >>> one flat XML file or something like that!
>> >>>> But it is not my main problem ...
>> >>>> I thought that to sort handles means objects lists
>> >>> will be consistent (Persons, Families, Events, etc ...)
>> >>>> Every time I import a Gramps XML, Gramps rebuilds
>> >>> (write, DB commit) some objects! Change time is not the same
>> >>> with a simple import then export.
>> >>>
>> >>> Well, they all need new handles, right?  Possibility
>> >>> of collisions.
>> >>> Also with gramps ids.
>> >>>
>> >>>> I can understand the random order used by bsddb, but
>> >>> this should not be done on some objects (like family) and
>> >>> not on the others.
>> >>>> In my mind, an import without DB change is like a
>> >>> "read-only": it is not the case. OK, you are saying that it
>> >>> is the way used by bsddb. XML files should be able to use
>> >>> 'diff' or revision control tools. With current Gramps XML
>> >>> import/export, these tools are limited. :(
>> >>>
>> >>> Yep.  You're probably looking for something like a
>> >>> UUID for each
>> >>> record.  Not a bad idea but not implemented at the
>> >>> moment.
>> >>>
>> >>>>
>> >>>> Jérôme
>> >>>>
>> >>>>
>> >>>> --- En date de : Ven 14.1.11, Gerald Britton <
>> [hidden email]>
>> >>> a écrit :
>> >>>>> De: Gerald Britton <[hidden email]>
>> >>>>> Objet: Re: [Gramps-devel]
>> >>> self.db.iter_object_handles(sort_handles=True)
>> >>>>> À: "jerome" <[hidden email]>
>> >>>>> Cc: [hidden email]
>> >>>>> Date: Vendredi 14 janvier 2011, 21h21
>> >>>>> On Fri, Jan 14, 2011 at 3:11 PM,
>> >>>>> jerome <[hidden email]>
>> >>>>> wrote:
>> >>>>>> I am not certain to understand ...
>> >>>>>> Keys should be handles, no ?
>> >>>>> Well, that's the question!  I can see a case for
>> >>>>> gramps ids, or
>> >>>>> surnames, or event dates, etc. etc.
>> >>>>>
>> >>>>>>
>> >>> 'self.db.get_{object}_handles(sort_handles=True)' is
>> >>>>> allowed,
>> >>>>>> not
>> >>> 'self.db.iter_{object}_handles(sort_handles=True)'!
>> >>>>>> There is two questions:
>> >>>>>>
>> >>>>>> 1. Why does Gramps only use
>> >>>>> self.db.iter_family_handles(), else
>> >>>>> self.get_{object}_handles(), where {object} is
>> >>> person or
>> >>>>> event or source or place or repository or note or
>> >>> media
>> >>>>> object.
>> >>>>>
>> >>>>> the get_...handles methods return a list, which
>> >>> can be
>> >>>>> expensive in
>> >>>>> memory and must read all objects in one pass.
>> >>> The
>> >>>>> iter... methods
>> >>>>> just return one at at time, so are cheaper in
>> >>> memory.
>> >>>>> So, the iter...
>> >>>>> methods are preferable.  OTOH, they cannot do
>> >>> sorting,
>> >>>>> since by
>> >>>>> definition you need to read all records before you
>> >>> can sort
>> >>>>> them.
>> >>>>>
>> >>>>>> 2. Why 'sort_handles=True' argument is
>> >>> allowed on all
>> >>>>> primary objects except family object ?
>> >>>>>
>> >>>>> I suppose that there has been no requirement so
>> >>> far so no
>> >>>>> one coded it up.
>> >>>>>
>> >>>>>>> The data is not ordered since it
>> >>>>>>> comes from bsddb in random order.
>> >>>>>> This could explain why I will not be able to
>> >>> keep
>> >>>>> order on XML import (to bsddb). :(
>> >>>>>>
>> >>>>>> Thanks.
>> >>>>>> Jérôme
>> >>>>>>
>> >>>>>> --- En date de : Ven 14.1.11, Gerald Britton
>> >>> <[hidden email]>
>> >>>>> a écrit :
>> >>>>>>> De: Gerald Britton <[hidden email]>
>> >>>>>>> Objet: Re: [Gramps-devel]
>> >>>>> self.db.iter_object_handles(sort_handles=True)
>> >>>>>>> À: "jerome" <[hidden email]>
>> >>>>>>> Cc: [hidden email]
>> >>>>>>> Date: Vendredi 14 janvier 2011, 19h53
>> >>>>>>> The data is not ordered since it
>> >>>>>>> comes from bsddb in random order.  If
>> >>>>>>> we ordered it, we would have to sort it
>> >>> by some
>> >>>>> key.
>> >>>>>>> So, if we did,
>> >>>>>>> what keys would you use for:
>> >>>>>>>
>> >>>>>>> person
>> >>>>>>> family
>> >>>>>>> event
>> >>>>>>> source
>> >>>>>>> place
>> >>>>>>> repository
>> >>>>>>> note
>> >>>>>>> media object
>> >>>>>>>
>> >>>>>>> On Fri, Jan 14, 2011 at 1:36 PM, jerome
>> >>> <[hidden email]>
>> >>>>>>> wrote:
>> >>>>>>>> Hi,
>> >>>>>>>>
>> >>>>>>>>
>> >>>>>>>> I am trying to get an answer to a
>> >>> question
>> >>>>> about the
>> >>>>>>> code: why we cannot keep the order of
>> >>> objects
>> >>>>> after a Gramps
>> >>>>>>> XML file import against export ?
>> >>>>>>>> Nick pointed out that objects are
>> >>> not ordered
>> >>>>> on
>> >>>>>>> export[1].
>> >>>>>>>> Why ? I suppose backup scripts or
>> >>> revision
>> >>>>> control
>> >>>>>>> tools will work better with ordered
>> >>> objects!
>> >>>>> Anyway, to use
>> >>>>>>> 'sort_handles=True' works on export,
>> >>> except for
>> >>>>> family
>> >>>>>>> handles. Any reason for that ? A typo
>> >>> somewhere ?
>> >>>>> On my side
>> >>>>>>> ?
>> >>>>>>>>
>> >>>>>>>> [1] http://www.gramps-project.org/bugs/view.php?id=4365
>> >>>>>>>>
>> >>>>>>>> regards,
>> >>>>>>>> Jérôme
>> >>>>>>>>
>> >>>>>>>>
>> >>>>>>>>
>> >>>>>>>>
>> >>>>>>>>
>> >>>
>> ------------------------------------------------------------------------------
>> >>>>>>>> Protect Your Site and Customers from
>> >>> Malware
>> >>>>> Attacks
>> >>>>>>>> Learn about various malware tactics
>> >>> and how
>> >>>>> to avoid
>> >>>>>>> them. Understand
>> >>>>>>>> malware threats, the impact they can
>> >>> have on
>> >>>>> your
>> >>>>>>> business, and how you
>> >>>>>>>> can protect your company and
>> >>> customers by
>> >>>>> using code
>> >>>>>>> signing.
>> >>>>>>>> http://p.sf.net/sfu/oracle-sfdevnl
>> >>>>>>>>
>> >>>>> _______________________________________________
>> >>>>>>>> Gramps-devel mailing list
>> >>>>>>>> [hidden email]
>> >>>>>>>> https://lists.sourceforge.net/lists/listinfo/gramps-devel
>> >>>>>>>>
>> >>>>>>>
>> >>>>>>>
>> >>>>>>> --
>> >>>>>>> Gerald Britton
>> >>>>>>>
>> >>>>>>
>> >>>>>>
>> >>>>>>
>> >>>>>
>> >>>>>
>> >>>>> --
>> >>>>> Gerald Britton
>> >>>>>
>> >>>>
>> >>>>
>> >>>>
>> >>>
>> >>>
>> >>> --
>> >>> Gerald Britton
>> >>>
>> >>
>> >>
>> >>
>> >>
>> ------------------------------------------------------------------------------
>> >> Protect Your Site and Customers from Malware Attacks
>> >> Learn about various malware tactics and how to avoid them. Understand
>> >> malware threats, the impact they can have on your business, and how you
>> >> can protect your company and customers by using code signing.
>> >> http://p.sf.net/sfu/oracle-sfdevnl
>> >> _______________________________________________
>> >> Gramps-devel mailing list
>> >> [hidden email]
>> >> https://lists.sourceforge.net/lists/listinfo/gramps-devel
>> >>
>> >
>>
>>
>>
>> ------------------------------------------------------------------------------
>> Protect Your Site and Customers from Malware Attacks
>> Learn about various malware tactics and how to avoid them. Understand
>> malware threats, the impact they can have on your business, and how you
>> can protect your company and customers by using code signing.
>> http://p.sf.net/sfu/oracle-sfdevnl
>> _______________________________________________
>> Gramps-devel mailing list
>> [hidden email]
>> https://lists.sourceforge.net/lists/listinfo/gramps-devel
>>
>

--
Sent from my mobile device

Gerald Britton


------------------------------------------------------------------------------
Protect Your Site and Customers from Malware Attacks
Learn about various malware tactics and how to avoid them. Understand
malware threats, the impact they can have on your business, and how you
can protect your company and customers by using code signing.
http://p.sf.net/sfu/oracle-sfdevnl
_______________________________________________
Gramps-devel mailing list
[hidden email]
https://lists.sourceforge.net/lists/listinfo/gramps-devel
Reply | Threaded
Open this post in threaded view
|

Re: self.db.iter_object_handles(sort_handles=True)

DS Blank
In reply to this post by Gerald Britton-2
On Sat, Jan 15, 2011 at 10:03 AM, Gerald Britton
<[hidden email]> wrote:
> Agreed. If the export is in handle order we should be fine.
> Re-importing though can generate new handles, can it not?  If so, we
> lose idempotency which is jerome's issue I think.

This wouldn't be a case with Jérôme's case because none of the objects
are new... they keep their same handles.

-Doug

> On 1/15/11, Benny Malengier <[hidden email]> wrote:
>> We should _never_ order on export.
>> We should only access things via an index in the database.
>>
>> Ordering would mean a huge time penalty on exporting for those with very
>> large family trees.
>> Even exporting along a bsddb index would be much slower, as now we go from
>> database page to database page.
>>
>> Just looping over the data and exporting means the the harddisk is the least
>> read (it goes from database page to database page).
>>
>> In other words:
>> 1/ default should be just a cursor of the database table, so order cannot be
>> maintained
>> 2/ ordered output could be optional. If we add an ordered output, it should
>> be along an index page of the database, so no in memory sorting must occur
>> before export can be done. I think ID has a sorted index over it. Handle
>> normally also, as it is the primary key, and will hence be in some sort of
>> B-tree. You must be sure to use the sort index on looping however.
>>
>> Benny
>>
>> 2011/1/15 Jérôme <[hidden email]>
>>
>>> > if the round-trip through gramps was idempotent, then the diff would be
>>> empty.
>>>
>>> Expected result was: minor change on date generation (if generated on an
>>> other day) and maybe media objects (media paths).
>>>
>>> I do not expect a full idem potent after round-trip, but currently we
>>> cannot easily get the differences. I just wanted testing complete XML
>>> migration before major release.
>>>
>>>
>>> Jérôme
>>>
>>>
>>> Doug Blank a écrit :
>>> > On Fri, Jan 14, 2011 at 4:31 PM, jerome <[hidden email]> wrote:
>>> >>>> gramps ids could be exotic!
>>> >>> Do you mean unique?  Anyway it is a good sort-key
>>> >>> candidate
>>> >> ids = [I000001, IAYUTRE235, zharb, /empty/ , etc ...]
>>> >>
>>> >> In 'handle' I trust! ;)
>>> >>
>>> >>>> Every time I import a Gramps XML, Gramps rebuilds
>>> >>> (write, DB commit) some objects! Change time is not the same
>>> >>> with a simple import then export.
>>> >>> Well, they all need new handles, right?  Possibility
>>> >>> of collisions.
>>> >>> Also with gramps ids.
>>> >> In fact, I want to keep handles: they should be the keys control.
>>> >>
>>> >> My problem could be illustrated by something like:
>>> >>
>>> >> $ gramps -i import.gramps -e export.gramps
>>> >> $ gunzip < import.gramps > import.xml
>>> >> $ gunzip < export.gramps > export.xml
>>> >> $ diff -u import.xml export.xml > diff.txt
>>> >>
>>> >> where import.gramps is our "Scientific control".
>>> >>
>>> >> What should be the content of diff.txt ?
>>> >>
>>> >> For me, it should be few lines...
>>> >> Unfortunatly there is some change (order, change time on family
>>> objects): that's strange!
>>> >
>>> > Yes, it would be handy to do this. This might be called "idempotent"
>>> > by a mathematician: if the round-trip through gramps was idempotent,
>>> > then the diff would be empty.
>>> >
>>> > What we need is:
>>> >
>>> > 1. something smarter than diff for this usage
>>> > 2. sort on something that doesn't change (like the handle), just for
>>> > this purpose
>>> > 3. make it so that the order is preserved
>>> >
>>> > I would lean towards #3. I've "fixed" some other places where the
>>> > order was lost. If you let me know which orders are lost, I'll
>>> > address.
>>> >
>>> > -Doug
>>> >
>>> >> Jérôme
>>> >>
>>> >>
>>> >> --- En date de : Ven 14.1.11, Gerald Britton <[hidden email]>
>>> a écrit :
>>> >>
>>> >>> De: Gerald Britton <[hidden email]>
>>> >>> Objet: Re: [Gramps-devel]
>>> self.db.iter_object_handles(sort_handles=True)
>>> >>> À: "jerome" <[hidden email]>
>>> >>> Cc: [hidden email]
>>> >>> Date: Vendredi 14 janvier 2011, 22h10
>>> >>> On Fri, Jan 14, 2011 at 3:59 PM,
>>> >>> jerome <[hidden email]>
>>> >>> wrote:
>>> >>>>>> I am not certain to understand ...
>>> >>>>>> Keys should be handles, no ?
>>> >>>>> Well, that's the question!  I can see a case for
>>> >>>>> gramps ids, or
>>> >>>>> surnames, or event dates, etc. etc.
>>> >>>> But handle is the easiest way and safe key for
>>> >>> ordering our data.
>>> >>>
>>> >>> Only if that's the order you want
>>> >>>
>>> >>>> gramps ids could be exotic!
>>> >>> Do you mean unique?  Anyway it is a good sort-key
>>> >>> candidate
>>> >>>
>>> >>>> surnames is not a good key :(
>>> >>> I can see that some would like it...makes the XML easier to
>>> >>> read by a human
>>> >>>
>>> >>>> date => date_object => year, then month, then
>>> >>> day, then rank, etc ... = horrible index
>>> >>>
>>> >>> Probably, but its just one possibility
>>> >>>
>>> >>>> My problem is on plugins/export/ExportXML.py
>>> >>>>
>>> >>>> I saw a sortByID function not used, then sometimes the
>>> >>> use of list (get_...), then iteration (only family
>>> >>> handles).
>>> >>>> I thought on use lists sorted by handle for having an
>>> >>> order rule. I do not want to group handles, handles will be
>>> >>> grouped into the Gramps XML, so it was not planned to parse
>>> >>> one flat XML file or something like that!
>>> >>>> But it is not my main problem ...
>>> >>>> I thought that to sort handles means objects lists
>>> >>> will be consistent (Persons, Families, Events, etc ...)
>>> >>>> Every time I import a Gramps XML, Gramps rebuilds
>>> >>> (write, DB commit) some objects! Change time is not the same
>>> >>> with a simple import then export.
>>> >>>
>>> >>> Well, they all need new handles, right?  Possibility
>>> >>> of collisions.
>>> >>> Also with gramps ids.
>>> >>>
>>> >>>> I can understand the random order used by bsddb, but
>>> >>> this should not be done on some objects (like family) and
>>> >>> not on the others.
>>> >>>> In my mind, an import without DB change is like a
>>> >>> "read-only": it is not the case. OK, you are saying that it
>>> >>> is the way used by bsddb. XML files should be able to use
>>> >>> 'diff' or revision control tools. With current Gramps XML
>>> >>> import/export, these tools are limited. :(
>>> >>>
>>> >>> Yep.  You're probably looking for something like a
>>> >>> UUID for each
>>> >>> record.  Not a bad idea but not implemented at the
>>> >>> moment.
>>> >>>
>>> >>>>
>>> >>>> Jérôme
>>> >>>>
>>> >>>>
>>> >>>> --- En date de : Ven 14.1.11, Gerald Britton <
>>> [hidden email]>
>>> >>> a écrit :
>>> >>>>> De: Gerald Britton <[hidden email]>
>>> >>>>> Objet: Re: [Gramps-devel]
>>> >>> self.db.iter_object_handles(sort_handles=True)
>>> >>>>> À: "jerome" <[hidden email]>
>>> >>>>> Cc: [hidden email]
>>> >>>>> Date: Vendredi 14 janvier 2011, 21h21
>>> >>>>> On Fri, Jan 14, 2011 at 3:11 PM,
>>> >>>>> jerome <[hidden email]>
>>> >>>>> wrote:
>>> >>>>>> I am not certain to understand ...
>>> >>>>>> Keys should be handles, no ?
>>> >>>>> Well, that's the question!  I can see a case for
>>> >>>>> gramps ids, or
>>> >>>>> surnames, or event dates, etc. etc.
>>> >>>>>
>>> >>>>>>
>>> >>> 'self.db.get_{object}_handles(sort_handles=True)' is
>>> >>>>> allowed,
>>> >>>>>> not
>>> >>> 'self.db.iter_{object}_handles(sort_handles=True)'!
>>> >>>>>> There is two questions:
>>> >>>>>>
>>> >>>>>> 1. Why does Gramps only use
>>> >>>>> self.db.iter_family_handles(), else
>>> >>>>> self.get_{object}_handles(), where {object} is
>>> >>> person or
>>> >>>>> event or source or place or repository or note or
>>> >>> media
>>> >>>>> object.
>>> >>>>>
>>> >>>>> the get_...handles methods return a list, which
>>> >>> can be
>>> >>>>> expensive in
>>> >>>>> memory and must read all objects in one pass.
>>> >>> The
>>> >>>>> iter... methods
>>> >>>>> just return one at at time, so are cheaper in
>>> >>> memory.
>>> >>>>> So, the iter...
>>> >>>>> methods are preferable.  OTOH, they cannot do
>>> >>> sorting,
>>> >>>>> since by
>>> >>>>> definition you need to read all records before you
>>> >>> can sort
>>> >>>>> them.
>>> >>>>>
>>> >>>>>> 2. Why 'sort_handles=True' argument is
>>> >>> allowed on all
>>> >>>>> primary objects except family object ?
>>> >>>>>
>>> >>>>> I suppose that there has been no requirement so
>>> >>> far so no
>>> >>>>> one coded it up.
>>> >>>>>
>>> >>>>>>> The data is not ordered since it
>>> >>>>>>> comes from bsddb in random order.
>>> >>>>>> This could explain why I will not be able to
>>> >>> keep
>>> >>>>> order on XML import (to bsddb). :(
>>> >>>>>>
>>> >>>>>> Thanks.
>>> >>>>>> Jérôme
>>> >>>>>>
>>> >>>>>> --- En date de : Ven 14.1.11, Gerald Britton
>>> >>> <[hidden email]>
>>> >>>>> a écrit :
>>> >>>>>>> De: Gerald Britton <[hidden email]>
>>> >>>>>>> Objet: Re: [Gramps-devel]
>>> >>>>> self.db.iter_object_handles(sort_handles=True)
>>> >>>>>>> À: "jerome" <[hidden email]>
>>> >>>>>>> Cc: [hidden email]
>>> >>>>>>> Date: Vendredi 14 janvier 2011, 19h53
>>> >>>>>>> The data is not ordered since it
>>> >>>>>>> comes from bsddb in random order.  If
>>> >>>>>>> we ordered it, we would have to sort it
>>> >>> by some
>>> >>>>> key.
>>> >>>>>>> So, if we did,
>>> >>>>>>> what keys would you use for:
>>> >>>>>>>
>>> >>>>>>> person
>>> >>>>>>> family
>>> >>>>>>> event
>>> >>>>>>> source
>>> >>>>>>> place
>>> >>>>>>> repository
>>> >>>>>>> note
>>> >>>>>>> media object
>>> >>>>>>>
>>> >>>>>>> On Fri, Jan 14, 2011 at 1:36 PM, jerome
>>> >>> <[hidden email]>
>>> >>>>>>> wrote:
>>> >>>>>>>> Hi,
>>> >>>>>>>>
>>> >>>>>>>>
>>> >>>>>>>> I am trying to get an answer to a
>>> >>> question
>>> >>>>> about the
>>> >>>>>>> code: why we cannot keep the order of
>>> >>> objects
>>> >>>>> after a Gramps
>>> >>>>>>> XML file import against export ?
>>> >>>>>>>> Nick pointed out that objects are
>>> >>> not ordered
>>> >>>>> on
>>> >>>>>>> export[1].
>>> >>>>>>>> Why ? I suppose backup scripts or
>>> >>> revision
>>> >>>>> control
>>> >>>>>>> tools will work better with ordered
>>> >>> objects!
>>> >>>>> Anyway, to use
>>> >>>>>>> 'sort_handles=True' works on export,
>>> >>> except for
>>> >>>>> family
>>> >>>>>>> handles. Any reason for that ? A typo
>>> >>> somewhere ?
>>> >>>>> On my side
>>> >>>>>>> ?
>>> >>>>>>>>
>>> >>>>>>>> [1] http://www.gramps-project.org/bugs/view.php?id=4365
>>> >>>>>>>>
>>> >>>>>>>> regards,
>>> >>>>>>>> Jérôme
>>> >>>>>>>>
>>> >>>>>>>>
>>> >>>>>>>>
>>> >>>>>>>>
>>> >>>>>>>>
>>> >>>
>>> ------------------------------------------------------------------------------
>>> >>>>>>>> Protect Your Site and Customers from
>>> >>> Malware
>>> >>>>> Attacks
>>> >>>>>>>> Learn about various malware tactics
>>> >>> and how
>>> >>>>> to avoid
>>> >>>>>>> them. Understand
>>> >>>>>>>> malware threats, the impact they can
>>> >>> have on
>>> >>>>> your
>>> >>>>>>> business, and how you
>>> >>>>>>>> can protect your company and
>>> >>> customers by
>>> >>>>> using code
>>> >>>>>>> signing.
>>> >>>>>>>> http://p.sf.net/sfu/oracle-sfdevnl
>>> >>>>>>>>
>>> >>>>> _______________________________________________
>>> >>>>>>>> Gramps-devel mailing list
>>> >>>>>>>> [hidden email]
>>> >>>>>>>> https://lists.sourceforge.net/lists/listinfo/gramps-devel
>>> >>>>>>>>
>>> >>>>>>>
>>> >>>>>>>
>>> >>>>>>> --
>>> >>>>>>> Gerald Britton
>>> >>>>>>>
>>> >>>>>>
>>> >>>>>>
>>> >>>>>>
>>> >>>>>
>>> >>>>>
>>> >>>>> --
>>> >>>>> Gerald Britton
>>> >>>>>
>>> >>>>
>>> >>>>
>>> >>>>
>>> >>>
>>> >>>
>>> >>> --
>>> >>> Gerald Britton
>>> >>>
>>> >>
>>> >>
>>> >>
>>> >>
>>> ------------------------------------------------------------------------------
>>> >> Protect Your Site and Customers from Malware Attacks
>>> >> Learn about various malware tactics and how to avoid them. Understand
>>> >> malware threats, the impact they can have on your business, and how you
>>> >> can protect your company and customers by using code signing.
>>> >> http://p.sf.net/sfu/oracle-sfdevnl
>>> >> _______________________________________________
>>> >> Gramps-devel mailing list
>>> >> [hidden email]
>>> >> https://lists.sourceforge.net/lists/listinfo/gramps-devel
>>> >>
>>> >
>>>
>>>
>>>
>>> ------------------------------------------------------------------------------
>>> Protect Your Site and Customers from Malware Attacks
>>> Learn about various malware tactics and how to avoid them. Understand
>>> malware threats, the impact they can have on your business, and how you
>>> can protect your company and customers by using code signing.
>>> http://p.sf.net/sfu/oracle-sfdevnl
>>> _______________________________________________
>>> Gramps-devel mailing list
>>> [hidden email]
>>> https://lists.sourceforge.net/lists/listinfo/gramps-devel
>>>
>>
>
> --
> Sent from my mobile device
>
> Gerald Britton
>

------------------------------------------------------------------------------
Protect Your Site and Customers from Malware Attacks
Learn about various malware tactics and how to avoid them. Understand
malware threats, the impact they can have on your business, and how you
can protect your company and customers by using code signing.
http://p.sf.net/sfu/oracle-sfdevnl
_______________________________________________
Gramps-devel mailing list
[hidden email]
https://lists.sourceforge.net/lists/listinfo/gramps-devel
Reply | Threaded
Open this post in threaded view
|

Re: self.db.iter_object_handles(sort_handles=True)

Benny Malengier
In reply to this post by DS Blank


2011/1/15 Doug Blank <[hidden email]>
On Sat, Jan 15, 2011 at 8:44 AM, Benny Malengier
<[hidden email]> wrote:
> We should _never_ order on export.
> We should only access things via an index in the database.

Benny,

If I understand what you mean, you mean don't sort export by something
*other* than an index. As long as we have an index to sort by, then we
are fine, right? Or did you mean something else?


For 300000 people, not following the index would be best, as then you don't hit a database page twice.
When you follow an index, you will jump over your database pages, and the data is too large to stay in memory.
It depends on bsddb structure if following sorted record key has this effect or not.

So, index is good, but not as good as just reading the database table out. It depends on how much performance you want.
For Gramps as a desktop applicatoin I can accept following an index is good enough, even if not the best.

Benny
 
-Doug

> Ordering would mean a huge time penalty on exporting for those with very
> large family trees.
> Even exporting along a bsddb index would be much slower, as now we go from
> database page to database page.
>
> Just looping over the data and exporting means the the harddisk is the least
> read (it goes from database page to database page).
>
> In other words:
> 1/ default should be just a cursor of the database table, so order cannot be
> maintained
> 2/ ordered output could be optional. If we add an ordered output, it should
> be along an index page of the database, so no in memory sorting must occur
> before export can be done. I think ID has a sorted index over it. Handle
> normally also, as it is the primary key, and will hence be in some sort of
> B-tree. You must be sure to use the sort index on looping however.
>
> Benny
>
> 2011/1/15 Jérôme <[hidden email]>
>>
>> > if the round-trip through gramps was idempotent, then the diff would be
>> > empty.
>>
>> Expected result was: minor change on date generation (if generated on an
>> other day) and maybe media objects (media paths).
>>
>> I do not expect a full idem potent after round-trip, but currently we
>> cannot easily get the differences. I just wanted testing complete XML
>> migration before major release.
>>
>>
>> Jérôme
>>
>>
>> Doug Blank a écrit :
>> > On Fri, Jan 14, 2011 at 4:31 PM, jerome <[hidden email]> wrote:
>> >>>> gramps ids could be exotic!
>> >>> Do you mean unique?  Anyway it is a good sort-key
>> >>> candidate
>> >> ids = [I000001, IAYUTRE235, zharb, /empty/ , etc ...]
>> >>
>> >> In 'handle' I trust! ;)
>> >>
>> >>>> Every time I import a Gramps XML, Gramps rebuilds
>> >>> (write, DB commit) some objects! Change time is not the same
>> >>> with a simple import then export.
>> >>> Well, they all need new handles, right?  Possibility
>> >>> of collisions.
>> >>> Also with gramps ids.
>> >> In fact, I want to keep handles: they should be the keys control.
>> >>
>> >> My problem could be illustrated by something like:
>> >>
>> >> $ gramps -i import.gramps -e export.gramps
>> >> $ gunzip < import.gramps > import.xml
>> >> $ gunzip < export.gramps > export.xml
>> >> $ diff -u import.xml export.xml > diff.txt
>> >>
>> >> where import.gramps is our "Scientific control".
>> >>
>> >> What should be the content of diff.txt ?
>> >>
>> >> For me, it should be few lines...
>> >> Unfortunatly there is some change (order, change time on family
>> >> objects): that's strange!
>> >
>> > Yes, it would be handy to do this. This might be called "idempotent"
>> > by a mathematician: if the round-trip through gramps was idempotent,
>> > then the diff would be empty.
>> >
>> > What we need is:
>> >
>> > 1. something smarter than diff for this usage
>> > 2. sort on something that doesn't change (like the handle), just for
>> > this purpose
>> > 3. make it so that the order is preserved
>> >
>> > I would lean towards #3. I've "fixed" some other places where the
>> > order was lost. If you let me know which orders are lost, I'll
>> > address.
>> >
>> > -Doug
>> >
>> >> Jérôme
>> >>
>> >>
>> >> --- En date de : Ven 14.1.11, Gerald Britton <[hidden email]>
>> >> a écrit :
>> >>
>> >>> De: Gerald Britton <[hidden email]>
>> >>> Objet: Re: [Gramps-devel]
>> >>> self.db.iter_object_handles(sort_handles=True)
>> >>> À: "jerome" <[hidden email]>
>> >>> Cc: [hidden email]
>> >>> Date: Vendredi 14 janvier 2011, 22h10
>> >>> On Fri, Jan 14, 2011 at 3:59 PM,
>> >>> jerome <[hidden email]>
>> >>> wrote:
>> >>>>>> I am not certain to understand ...
>> >>>>>> Keys should be handles, no ?
>> >>>>> Well, that's the question!  I can see a case for
>> >>>>> gramps ids, or
>> >>>>> surnames, or event dates, etc. etc.
>> >>>> But handle is the easiest way and safe key for
>> >>> ordering our data.
>> >>>
>> >>> Only if that's the order you want
>> >>>
>> >>>> gramps ids could be exotic!
>> >>> Do you mean unique?  Anyway it is a good sort-key
>> >>> candidate
>> >>>
>> >>>> surnames is not a good key :(
>> >>> I can see that some would like it...makes the XML easier to
>> >>> read by a human
>> >>>
>> >>>> date => date_object => year, then month, then
>> >>> day, then rank, etc ... = horrible index
>> >>>
>> >>> Probably, but its just one possibility
>> >>>
>> >>>> My problem is on plugins/export/ExportXML.py
>> >>>>
>> >>>> I saw a sortByID function not used, then sometimes the
>> >>> use of list (get_...), then iteration (only family
>> >>> handles).
>> >>>> I thought on use lists sorted by handle for having an
>> >>> order rule. I do not want to group handles, handles will be
>> >>> grouped into the Gramps XML, so it was not planned to parse
>> >>> one flat XML file or something like that!
>> >>>> But it is not my main problem ...
>> >>>> I thought that to sort handles means objects lists
>> >>> will be consistent (Persons, Families, Events, etc ...)
>> >>>> Every time I import a Gramps XML, Gramps rebuilds
>> >>> (write, DB commit) some objects! Change time is not the same
>> >>> with a simple import then export.
>> >>>
>> >>> Well, they all need new handles, right?  Possibility
>> >>> of collisions.
>> >>> Also with gramps ids.
>> >>>
>> >>>> I can understand the random order used by bsddb, but
>> >>> this should not be done on some objects (like family) and
>> >>> not on the others.
>> >>>> In my mind, an import without DB change is like a
>> >>> "read-only": it is not the case. OK, you are saying that it
>> >>> is the way used by bsddb. XML files should be able to use
>> >>> 'diff' or revision control tools. With current Gramps XML
>> >>> import/export, these tools are limited. :(
>> >>>
>> >>> Yep.  You're probably looking for something like a
>> >>> UUID for each
>> >>> record.  Not a bad idea but not implemented at the
>> >>> moment.
>> >>>
>> >>>>
>> >>>> Jérôme
>> >>>>
>> >>>>
>> >>>> --- En date de : Ven 14.1.11, Gerald Britton
>> >>>> <[hidden email]>
>> >>> a écrit :
>> >>>>> De: Gerald Britton <[hidden email]>
>> >>>>> Objet: Re: [Gramps-devel]
>> >>> self.db.iter_object_handles(sort_handles=True)
>> >>>>> À: "jerome" <[hidden email]>
>> >>>>> Cc: [hidden email]
>> >>>>> Date: Vendredi 14 janvier 2011, 21h21
>> >>>>> On Fri, Jan 14, 2011 at 3:11 PM,
>> >>>>> jerome <[hidden email]>
>> >>>>> wrote:
>> >>>>>> I am not certain to understand ...
>> >>>>>> Keys should be handles, no ?
>> >>>>> Well, that's the question!  I can see a case for
>> >>>>> gramps ids, or
>> >>>>> surnames, or event dates, etc. etc.
>> >>>>>
>> >>>>>>
>> >>> 'self.db.get_{object}_handles(sort_handles=True)' is
>> >>>>> allowed,
>> >>>>>> not
>> >>> 'self.db.iter_{object}_handles(sort_handles=True)'!
>> >>>>>> There is two questions:
>> >>>>>>
>> >>>>>> 1. Why does Gramps only use
>> >>>>> self.db.iter_family_handles(), else
>> >>>>> self.get_{object}_handles(), where {object} is
>> >>> person or
>> >>>>> event or source or place or repository or note or
>> >>> media
>> >>>>> object.
>> >>>>>
>> >>>>> the get_...handles methods return a list, which
>> >>> can be
>> >>>>> expensive in
>> >>>>> memory and must read all objects in one pass.
>> >>> The
>> >>>>> iter... methods
>> >>>>> just return one at at time, so are cheaper in
>> >>> memory.
>> >>>>> So, the iter...
>> >>>>> methods are preferable.  OTOH, they cannot do
>> >>> sorting,
>> >>>>> since by
>> >>>>> definition you need to read all records before you
>> >>> can sort
>> >>>>> them.
>> >>>>>
>> >>>>>> 2. Why 'sort_handles=True' argument is
>> >>> allowed on all
>> >>>>> primary objects except family object ?
>> >>>>>
>> >>>>> I suppose that there has been no requirement so
>> >>> far so no
>> >>>>> one coded it up.
>> >>>>>
>> >>>>>>> The data is not ordered since it
>> >>>>>>> comes from bsddb in random order.
>> >>>>>> This could explain why I will not be able to
>> >>> keep
>> >>>>> order on XML import (to bsddb). :(
>> >>>>>>
>> >>>>>> Thanks.
>> >>>>>> Jérôme
>> >>>>>>
>> >>>>>> --- En date de : Ven 14.1.11, Gerald Britton
>> >>> <[hidden email]>
>> >>>>> a écrit :
>> >>>>>>> De: Gerald Britton <[hidden email]>
>> >>>>>>> Objet: Re: [Gramps-devel]
>> >>>>> self.db.iter_object_handles(sort_handles=True)
>> >>>>>>> À: "jerome" <[hidden email]>
>> >>>>>>> Cc: [hidden email]
>> >>>>>>> Date: Vendredi 14 janvier 2011, 19h53
>> >>>>>>> The data is not ordered since it
>> >>>>>>> comes from bsddb in random order.  If
>> >>>>>>> we ordered it, we would have to sort it
>> >>> by some
>> >>>>> key.
>> >>>>>>> So, if we did,
>> >>>>>>> what keys would you use for:
>> >>>>>>>
>> >>>>>>> person
>> >>>>>>> family
>> >>>>>>> event
>> >>>>>>> source
>> >>>>>>> place
>> >>>>>>> repository
>> >>>>>>> note
>> >>>>>>> media object
>> >>>>>>>
>> >>>>>>> On Fri, Jan 14, 2011 at 1:36 PM, jerome
>> >>> <[hidden email]>
>> >>>>>>> wrote:
>> >>>>>>>> Hi,
>> >>>>>>>>
>> >>>>>>>>
>> >>>>>>>> I am trying to get an answer to a
>> >>> question
>> >>>>> about the
>> >>>>>>> code: why we cannot keep the order of
>> >>> objects
>> >>>>> after a Gramps
>> >>>>>>> XML file import against export ?
>> >>>>>>>> Nick pointed out that objects are
>> >>> not ordered
>> >>>>> on
>> >>>>>>> export[1].
>> >>>>>>>> Why ? I suppose backup scripts or
>> >>> revision
>> >>>>> control
>> >>>>>>> tools will work better with ordered
>> >>> objects!
>> >>>>> Anyway, to use
>> >>>>>>> 'sort_handles=True' works on export,
>> >>> except for
>> >>>>> family
>> >>>>>>> handles. Any reason for that ? A typo
>> >>> somewhere ?
>> >>>>> On my side
>> >>>>>>> ?
>> >>>>>>>>
>> >>>>>>>> [1] http://www.gramps-project.org/bugs/view.php?id=4365
>> >>>>>>>>
>> >>>>>>>> regards,
>> >>>>>>>> Jérôme
>> >>>>>>>>
>> >>>>>>>>
>> >>>>>>>>
>> >>>>>>>>
>> >>>>>>>>
>> >>>
>> >>> ------------------------------------------------------------------------------
>> >>>>>>>> Protect Your Site and Customers from
>> >>> Malware
>> >>>>> Attacks
>> >>>>>>>> Learn about various malware tactics
>> >>> and how
>> >>>>> to avoid
>> >>>>>>> them. Understand
>> >>>>>>>> malware threats, the impact they can
>> >>> have on
>> >>>>> your
>> >>>>>>> business, and how you
>> >>>>>>>> can protect your company and
>> >>> customers by
>> >>>>> using code
>> >>>>>>> signing.
>> >>>>>>>> http://p.sf.net/sfu/oracle-sfdevnl
>> >>>>>>>>
>> >>>>> _______________________________________________
>> >>>>>>>> Gramps-devel mailing list
>> >>>>>>>> [hidden email]
>> >>>>>>>> https://lists.sourceforge.net/lists/listinfo/gramps-devel
>> >>>>>>>>
>> >>>>>>>
>> >>>>>>>
>> >>>>>>> --
>> >>>>>>> Gerald Britton
>> >>>>>>>
>> >>>>>>
>> >>>>>>
>> >>>>>>
>> >>>>>
>> >>>>>
>> >>>>> --
>> >>>>> Gerald Britton
>> >>>>>
>> >>>>
>> >>>>
>> >>>>
>> >>>
>> >>>
>> >>> --
>> >>> Gerald Britton
>> >>>
>> >>
>> >>
>> >>
>> >>
>> >> ------------------------------------------------------------------------------
>> >> Protect Your Site and Customers from Malware Attacks
>> >> Learn about various malware tactics and how to avoid them. Understand
>> >> malware threats, the impact they can have on your business, and how you
>> >> can protect your company and customers by using code signing.
>> >> http://p.sf.net/sfu/oracle-sfdevnl
>> >> _______________________________________________
>> >> Gramps-devel mailing list
>> >> [hidden email]
>> >> https://lists.sourceforge.net/lists/listinfo/gramps-devel
>> >>
>> >
>>
>>
>>
>> ------------------------------------------------------------------------------
>> Protect Your Site and Customers from Malware Attacks
>> Learn about various malware tactics and how to avoid them. Understand
>> malware threats, the impact they can have on your business, and how you
>> can protect your company and customers by using code signing.
>> http://p.sf.net/sfu/oracle-sfdevnl
>> _______________________________________________
>> Gramps-devel mailing list
>> [hidden email]
>> https://lists.sourceforge.net/lists/listinfo/gramps-devel
>
>


------------------------------------------------------------------------------
Protect Your Site and Customers from Malware Attacks
Learn about various malware tactics and how to avoid them. Understand
malware threats, the impact they can have on your business, and how you
can protect your company and customers by using code signing.
http://p.sf.net/sfu/oracle-sfdevnl
_______________________________________________
Gramps-devel mailing list
[hidden email]
https://lists.sourceforge.net/lists/listinfo/gramps-devel
Reply | Threaded
Open this post in threaded view
|

Re: self.db.iter_object_handles(sort_handles=True)

jerome
In reply to this post by jerome
Sorry, it was not a request for sorting handle or use of index.
It is was just the only one way to get Gramps more idempotent with the Gramps XML file format, which is not the bsddb.

True, the title is about iteration an sort argument but it was an illustration related to ExportXml.py module.

ImportXml.py (or something else) is rewritting some family objects and re-order events, notes, places, etc ... after a simple Gramps XML round-trip. This cannot affect bsddb performance, right ?

To get a list or to iterate was only one method to try to get Gramps more idempotent after a simple Gramps XML round-trip without any DB change done by the user.

$ gramps -i import.gramps -e export.gramps

As user, I thought "import.gramps = export.gramps"

If it is related to bsddb performances, then this makes me thinking that Gramps should not use this XML parser!!!
As Gramps uses ImportXml.py, something might be wrong here ...
If to write XML data to bsddb is not able to keep the order, then something is strange because person handles seem to keep the order!



PS: there is also a performance issues on Gramps XML import (slow down : python 2.6/7, bsddb ?)


Jérôme


--- En date de : Sam 15.1.11, Benny Malengier <[hidden email]> a écrit :

De: Benny Malengier <[hidden email]>
Objet: Re: [Gramps-devel] self.db.iter_object_handles(sort_handles=True)
À: "Doug Blank" <[hidden email]>
Cc: [hidden email], [hidden email], "Gerald Britton" <[hidden email]>
Date: Samedi 15 janvier 2011, 16h13



2011/1/15 Doug Blank <[hidden email]>

On Sat, Jan 15, 2011 at 8:44 AM, Benny Malengier

<[hidden email]> wrote:

> We should _never_ order on export.

> We should only access things via an index in the database.



Benny,



If I understand what you mean, you mean don't sort export by something

*other* than an index. As long as we have an index to sort by, then we

are fine, right? Or did you mean something else?



For 300000 people, not following the index would be best, as then you don't hit a database page twice.
When you follow an index, you will jump over your database pages, and the data is too large to stay in memory.

It depends on bsddb structure if following sorted record key has this effect or not.

So, index is good, but not as good as just reading the database table out. It depends on how much performance you want.
For Gramps as a desktop applicatoin I can accept following an index is good enough, even if not the best.


Benny
 

-Doug



> Ordering would mean a huge time penalty on exporting for those with very

> large family trees.

> Even exporting along a bsddb index would be much slower, as now we go from

> database page to database page.

>

> Just looping over the data and exporting means the the harddisk is the least

> read (it goes from database page to database page).

>

> In other words:

> 1/ default should be just a cursor of the database table, so order cannot be

> maintained

> 2/ ordered output could be optional. If we add an ordered output, it should

> be along an index page of the database, so no in memory sorting must occur

> before export can be done. I think ID has a sorted index over it. Handle

> normally also, as it is the primary key, and will hence be in some sort of

> B-tree. You must be sure to use the sort index on looping however.

>

> Benny

>

> 2011/1/15 Jérôme <[hidden email]>

>>

>> > if the round-trip through gramps was idempotent, then the diff would be

>> > empty.

>>

>> Expected result was: minor change on date generation (if generated on an

>> other day) and maybe media objects (media paths).

>>

>> I do not expect a full idem potent after round-trip, but currently we

>> cannot easily get the differences. I just wanted testing complete XML

>> migration before major release.

>>

>>

>> Jérôme

>>

>>

>> Doug Blank a écrit :

>> > On Fri, Jan 14, 2011 at 4:31 PM, jerome <[hidden email]> wrote:

>> >>>> gramps ids could be exotic!

>> >>> Do you mean unique?  Anyway it is a good sort-key

>> >>> candidate

>> >> ids = [I000001, IAYUTRE235, zharb, /empty/ , etc ...]

>> >>

>> >> In 'handle' I trust! ;)

>> >>

>> >>>> Every time I import a Gramps XML, Gramps rebuilds

>> >>> (write, DB commit) some objects! Change time is not the same

>> >>> with a simple import then export.

>> >>> Well, they all need new handles, right?  Possibility

>> >>> of collisions.

>> >>> Also with gramps ids.

>> >> In fact, I want to keep handles: they should be the keys control.

>> >>

>> >> My problem could be illustrated by something like:

>> >>

>> >> $ gramps -i import.gramps -e export.gramps

>> >> $ gunzip < import.gramps > import.xml

>> >> $ gunzip < export.gramps > export.xml

>> >> $ diff -u import.xml export.xml > diff.txt

>> >>

>> >> where import.gramps is our "Scientific control".

>> >>

>> >> What should be the content of diff.txt ?

>> >>

>> >> For me, it should be few lines...

>> >> Unfortunatly there is some change (order, change time on family

>> >> objects): that's strange!

>> >

>> > Yes, it would be handy to do this. This might be called "idempotent"

>> > by a mathematician: if the round-trip through gramps was idempotent,

>> > then the diff would be empty.

>> >

>> > What we need is:

>> >

>> > 1. something smarter than diff for this usage

>> > 2. sort on something that doesn't change (like the handle), just for

>> > this purpose

>> > 3. make it so that the order is preserved

>> >

>> > I would lean towards #3. I've "fixed" some other places where the

>> > order was lost. If you let me know which orders are lost, I'll

>> > address.

>> >

>> > -Doug

>> >

>> >> Jérôme

>> >>

>> >>

>> >> --- En date de : Ven 14.1.11, Gerald Britton <[hidden email]>

>> >> a écrit :

>> >>

>> >>> De: Gerald Britton <[hidden email]>

>> >>> Objet: Re: [Gramps-devel]

>> >>> self.db.iter_object_handles(sort_handles=True)

>> >>> À: "jerome" <[hidden email]>

>> >>> Cc: [hidden email]

>> >>> Date: Vendredi 14 janvier 2011, 22h10

>> >>> On Fri, Jan 14, 2011 at 3:59 PM,

>> >>> jerome <[hidden email]>

>> >>> wrote:

>> >>>>>> I am not certain to understand ...

>> >>>>>> Keys should be handles, no ?

>> >>>>> Well, that's the question!  I can see a case for

>> >>>>> gramps ids, or

>> >>>>> surnames, or event dates, etc. etc.

>> >>>> But handle is the easiest way and safe key for

>> >>> ordering our data.

>> >>>

>> >>> Only if that's the order you want

>> >>>

>> >>>> gramps ids could be exotic!

>> >>> Do you mean unique?  Anyway it is a good sort-key

>> >>> candidate

>> >>>

>> >>>> surnames is not a good key :(

>> >>> I can see that some would like it...makes the XML easier to

>> >>> read by a human

>> >>>

>> >>>> date => date_object => year, then month, then

>> >>> day, then rank, etc ... = horrible index

>> >>>

>> >>> Probably, but its just one possibility

>> >>>

>> >>>> My problem is on plugins/export/ExportXML.py

>> >>>>

>> >>>> I saw a sortByID function not used, then sometimes the

>> >>> use of list (get_...), then iteration (only family

>> >>> handles).

>> >>>> I thought on use lists sorted by handle for having an

>> >>> order rule. I do not want to group handles, handles will be

>> >>> grouped into the Gramps XML, so it was not planned to parse

>> >>> one flat XML file or something like that!

>> >>>> But it is not my main problem ...

>> >>>> I thought that to sort handles means objects lists

>> >>> will be consistent (Persons, Families, Events, etc ...)

>> >>>> Every time I import a Gramps XML, Gramps rebuilds

>> >>> (write, DB commit) some objects! Change time is not the same

>> >>> with a simple import then export.

>> >>>

>> >>> Well, they all need new handles, right?  Possibility

>> >>> of collisions.

>> >>> Also with gramps ids.

>> >>>

>> >>>> I can understand the random order used by bsddb, but

>> >>> this should not be done on some objects (like family) and

>> >>> not on the others.

>> >>>> In my mind, an import without DB change is like a

>> >>> "read-only": it is not the case. OK, you are saying that it

>> >>> is the way used by bsddb. XML files should be able to use

>> >>> 'diff' or revision control tools. With current Gramps XML

>> >>> import/export, these tools are limited. :(

>> >>>

>> >>> Yep.  You're probably looking for something like a

>> >>> UUID for each

>> >>> record.  Not a bad idea but not implemented at the

>> >>> moment.

>> >>>

>> >>>>

>> >>>> Jérôme

>> >>>>

>> >>>>

>> >>>> --- En date de : Ven 14.1.11, Gerald Britton

>> >>>> <[hidden email]>

>> >>> a écrit :

>> >>>>> De: Gerald Britton <[hidden email]>

>> >>>>> Objet: Re: [Gramps-devel]

>> >>> self.db.iter_object_handles(sort_handles=True)

>> >>>>> À: "jerome" <[hidden email]>

>> >>>>> Cc: [hidden email]

>> >>>>> Date: Vendredi 14 janvier 2011, 21h21

>> >>>>> On Fri, Jan 14, 2011 at 3:11 PM,

>> >>>>> jerome <[hidden email]>

>> >>>>> wrote:

>> >>>>>> I am not certain to understand ...

>> >>>>>> Keys should be handles, no ?

>> >>>>> Well, that's the question!  I can see a case for

>> >>>>> gramps ids, or

>> >>>>> surnames, or event dates, etc. etc.

>> >>>>>

>> >>>>>>

>> >>> 'self.db.get_{object}_handles(sort_handles=True)' is

>> >>>>> allowed,

>> >>>>>> not

>> >>> 'self.db.iter_{object}_handles(sort_handles=True)'!

>> >>>>>> There is two questions:

>> >>>>>>

>> >>>>>> 1. Why does Gramps only use

>> >>>>> self.db.iter_family_handles(), else

>> >>>>> self.get_{object}_handles(), where {object} is

>> >>> person or

>> >>>>> event or source or place or repository or note or

>> >>> media

>> >>>>> object.

>> >>>>>

>> >>>>> the get_...handles methods return a list, which

>> >>> can be

>> >>>>> expensive in

>> >>>>> memory and must read all objects in one pass.

>> >>> The

>> >>>>> iter... methods

>> >>>>> just return one at at time, so are cheaper in

>> >>> memory.

>> >>>>> So, the iter...

>> >>>>> methods are preferable.  OTOH, they cannot do

>> >>> sorting,

>> >>>>> since by

>> >>>>> definition you need to read all records before you

>> >>> can sort

>> >>>>> them.

>> >>>>>

>> >>>>>> 2. Why 'sort_handles=True' argument is

>> >>> allowed on all

>> >>>>> primary objects except family object ?

>> >>>>>

>> >>>>> I suppose that there has been no requirement so

>> >>> far so no

>> >>>>> one coded it up.

>> >>>>>

>> >>>>>>> The data is not ordered since it

>> >>>>>>> comes from bsddb in random order.

>> >>>>>> This could explain why I will not be able to

>> >>> keep

>> >>>>> order on XML import (to bsddb). :(

>> >>>>>>

>> >>>>>> Thanks.

>> >>>>>> Jérôme

>> >>>>>>

>> >>>>>> --- En date de : Ven 14.1.11, Gerald Britton

>> >>> <[hidden email]>

>> >>>>> a écrit :

>> >>>>>>> De: Gerald Britton <[hidden email]>

>> >>>>>>> Objet: Re: [Gramps-devel]

>> >>>>> self.db.iter_object_handles(sort_handles=True)

>> >>>>>>> À: "jerome" <[hidden email]>

>> >>>>>>> Cc: [hidden email]

>> >>>>>>> Date: Vendredi 14 janvier 2011, 19h53

>> >>>>>>> The data is not ordered since it

>> >>>>>>> comes from bsddb in random order.  If

>> >>>>>>> we ordered it, we would have to sort it

>> >>> by some

>> >>>>> key.

>> >>>>>>> So, if we did,

>> >>>>>>> what keys would you use for:

>> >>>>>>>

>> >>>>>>> person

>> >>>>>>> family

>> >>>>>>> event

>> >>>>>>> source

>> >>>>>>> place

>> >>>>>>> repository

>> >>>>>>> note

>> >>>>>>> media object

>> >>>>>>>

>> >>>>>>> On Fri, Jan 14, 2011 at 1:36 PM, jerome

>> >>> <[hidden email]>

>> >>>>>>> wrote:

>> >>>>>>>> Hi,

>> >>>>>>>>

>> >>>>>>>>

>> >>>>>>>> I am trying to get an answer to a

>> >>> question

>> >>>>> about the

>> >>>>>>> code: why we cannot keep the order of

>> >>> objects

>> >>>>> after a Gramps

>> >>>>>>> XML file import against export ?

>> >>>>>>>> Nick pointed out that objects are

>> >>> not ordered

>> >>>>> on

>> >>>>>>> export[1].

>> >>>>>>>> Why ? I suppose backup scripts or

>> >>> revision

>> >>>>> control

>> >>>>>>> tools will work better with ordered

>> >>> objects!

>> >>>>> Anyway, to use

>> >>>>>>> 'sort_handles=True' works on export,

>> >>> except for

>> >>>>> family

>> >>>>>>> handles. Any reason for that ? A typo

>> >>> somewhere ?

>> >>>>> On my side

>> >>>>>>> ?

>> >>>>>>>>

>> >>>>>>>> [1] http://www.gramps-project.org/bugs/view.php?id=4365

>> >>>>>>>>

>> >>>>>>>> regards,

>> >>>>>>>> Jérôme

>> >>>>>>>>

>> >>>>>>>>

>> >>>>>>>>

>> >>>>>>>>

>> >>>>>>>>

>> >>>

>> >>> ------------------------------------------------------------------------------

>> >>>>>>>> Protect Your Site and Customers from

>> >>> Malware

>> >>>>> Attacks

>> >>>>>>>> Learn about various malware tactics

>> >>> and how

>> >>>>> to avoid

>> >>>>>>> them. Understand

>> >>>>>>>> malware threats, the impact they can

>> >>> have on

>> >>>>> your

>> >>>>>>> business, and how you

>> >>>>>>>> can protect your company and

>> >>> customers by

>> >>>>> using code

>> >>>>>>> signing.

>> >>>>>>>> http://p.sf.net/sfu/oracle-sfdevnl

>> >>>>>>>>

>> >>>>> _______________________________________________

>> >>>>>>>> Gramps-devel mailing list

>> >>>>>>>> [hidden email]

>> >>>>>>>> https://lists.sourceforge.net/lists/listinfo/gramps-devel

>> >>>>>>>>

>> >>>>>>>

>> >>>>>>>

>> >>>>>>> --

>> >>>>>>> Gerald Britton

>> >>>>>>>

>> >>>>>>

>> >>>>>>

>> >>>>>>

>> >>>>>

>> >>>>>

>> >>>>> --

>> >>>>> Gerald Britton

>> >>>>>

>> >>>>

>> >>>>

>> >>>>

>> >>>

>> >>>

>> >>> --

>> >>> Gerald Britton

>> >>>

>> >>

>> >>

>> >>

>> >>

>> >> ------------------------------------------------------------------------------

>> >> Protect Your Site and Customers from Malware Attacks

>> >> Learn about various malware tactics and how to avoid them. Understand

>> >> malware threats, the impact they can have on your business, and how you

>> >> can protect your company and customers by using code signing.

>> >> http://p.sf.net/sfu/oracle-sfdevnl

>> >> _______________________________________________

>> >> Gramps-devel mailing list

>> >> [hidden email]

>> >> https://lists.sourceforge.net/lists/listinfo/gramps-devel

>> >>

>> >

>>

>>

>>

>> ------------------------------------------------------------------------------

>> Protect Your Site and Customers from Malware Attacks

>> Learn about various malware tactics and how to avoid them. Understand

>> malware threats, the impact they can have on your business, and how you

>> can protect your company and customers by using code signing.

>> http://p.sf.net/sfu/oracle-sfdevnl

>> _______________________________________________

>> Gramps-devel mailing list

>> [hidden email]

>> https://lists.sourceforge.net/lists/listinfo/gramps-devel

>

>






     

------------------------------------------------------------------------------
Protect Your Site and Customers from Malware Attacks
Learn about various malware tactics and how to avoid them. Understand
malware threats, the impact they can have on your business, and how you
can protect your company and customers by using code signing.
http://p.sf.net/sfu/oracle-sfdevnl
_______________________________________________
Gramps-devel mailing list
[hidden email]
https://lists.sourceforge.net/lists/listinfo/gramps-devel
Reply | Threaded
Open this post in threaded view
|

Re: self.db.iter_object_handles(sort_handles=True)

Gerald Britton-2
The handle key is a bsddb hash thus stored randomly (pseudo randomly).
So export, import, export will not likely produce two identical
exports.

Otoh. You could run the xml file through xml query and use the "order"
clause to sort it.

On 1/15/11, jerome <[hidden email]> wrote:

> Sorry, it was not a request for sorting handle or use of index.
> It is was just the only one way to get Gramps more idempotent with the
> Gramps XML file format, which is not the bsddb.
>
> True, the title is about iteration an sort argument but it was an
> illustration related to ExportXml.py module.
>
> ImportXml.py (or something else) is rewritting some family objects and
> re-order events, notes, places, etc ... after a simple Gramps XML
> round-trip. This cannot affect bsddb performance, right ?
>
> To get a list or to iterate was only one method to try to get Gramps more
> idempotent after a simple Gramps XML round-trip without any DB change done
> by the user.
>
> $ gramps -i import.gramps -e export.gramps
>
> As user, I thought "import.gramps = export.gramps"
>
> If it is related to bsddb performances, then this makes me thinking that
> Gramps should not use this XML parser!!!
> As Gramps uses ImportXml.py, something might be wrong here ...
> If to write XML data to bsddb is not able to keep the order, then something
> is strange because person handles seem to keep the order!
>
>
>
> PS: there is also a performance issues on Gramps XML import (slow down :
> python 2.6/7, bsddb ?)
>
>
> Jérôme
>
>
> --- En date de : Sam 15.1.11, Benny Malengier <[hidden email]> a
> écrit :
>
> De: Benny Malengier <[hidden email]>
> Objet: Re: [Gramps-devel] self.db.iter_object_handles(sort_handles=True)
> À: "Doug Blank" <[hidden email]>
> Cc: [hidden email], [hidden email], "Gerald Britton"
> <[hidden email]>
> Date: Samedi 15 janvier 2011, 16h13
>
>
>
> 2011/1/15 Doug Blank <[hidden email]>
>
> On Sat, Jan 15, 2011 at 8:44 AM, Benny Malengier
>
> <[hidden email]> wrote:
>
>> We should _never_ order on export.
>
>> We should only access things via an index in the database.
>
>
>
> Benny,
>
>
>
> If I understand what you mean, you mean don't sort export by something
>
> *other* than an index. As long as we have an index to sort by, then we
>
> are fine, right? Or did you mean something else?
>
>
>
> For 300000 people, not following the index would be best, as then you don't
> hit a database page twice.
> When you follow an index, you will jump over your database pages, and the
> data is too large to stay in memory.
>
> It depends on bsddb structure if following sorted record key has this effect
> or not.
>
> So, index is good, but not as good as just reading the database table out.
> It depends on how much performance you want.
> For Gramps as a desktop applicatoin I can accept following an index is good
> enough, even if not the best.
>
>
> Benny
>
>
> -Doug
>
>
>
>> Ordering would mean a huge time penalty on exporting for those with very
>
>> large family trees.
>
>> Even exporting along a bsddb index would be much slower, as now we go from
>
>> database page to database page.
>
>>
>
>> Just looping over the data and exporting means the the harddisk is the
>> least
>
>> read (it goes from database page to database page).
>
>>
>
>> In other words:
>
>> 1/ default should be just a cursor of the database table, so order cannot
>> be
>
>> maintained
>
>> 2/ ordered output could be optional. If we add an ordered output, it
>> should
>
>> be along an index page of the database, so no in memory sorting must occur
>
>> before export can be done. I think ID has a sorted index over it. Handle
>
>> normally also, as it is the primary key, and will hence be in some sort of
>
>> B-tree. You must be sure to use the sort index on looping however.
>
>>
>
>> Benny
>
>>
>
>> 2011/1/15 Jérôme <[hidden email]>
>
>>>
>
>>> > if the round-trip through gramps was idempotent, then the diff would be
>
>>> > empty.
>
>>>
>
>>> Expected result was: minor change on date generation (if generated on an
>
>>> other day) and maybe media objects (media paths).
>
>>>
>
>>> I do not expect a full idem potent after round-trip, but currently we
>
>>> cannot easily get the differences. I just wanted testing complete XML
>
>>> migration before major release.
>
>>>
>
>>>
>
>>> Jérôme
>
>>>
>
>>>
>
>>> Doug Blank a écrit :
>
>>> > On Fri, Jan 14, 2011 at 4:31 PM, jerome <[hidden email]> wrote:
>
>>> >>>> gramps ids could be exotic!
>
>>> >>> Do you mean unique?  Anyway it is a good sort-key
>
>>> >>> candidate
>
>>> >> ids = [I000001, IAYUTRE235, zharb, /empty/ , etc ...]
>
>>> >>
>
>>> >> In 'handle' I trust! ;)
>
>>> >>
>
>>> >>>> Every time I import a Gramps XML, Gramps rebuilds
>
>>> >>> (write, DB commit) some objects! Change time is not the same
>
>>> >>> with a simple import then export.
>
>>> >>> Well, they all need new handles, right?  Possibility
>
>>> >>> of collisions.
>
>>> >>> Also with gramps ids.
>
>>> >> In fact, I want to keep handles: they should be the keys control.
>
>>> >>
>
>>> >> My problem could be illustrated by something like:
>
>>> >>
>
>>> >> $ gramps -i import.gramps -e export.gramps
>
>>> >> $ gunzip < import.gramps > import.xml
>
>>> >> $ gunzip < export.gramps > export.xml
>
>>> >> $ diff -u import.xml export.xml > diff.txt
>
>>> >>
>
>>> >> where import.gramps is our "Scientific control".
>
>>> >>
>
>>> >> What should be the content of diff.txt ?
>
>>> >>
>
>>> >> For me, it should be few lines...
>
>>> >> Unfortunatly there is some change (order, change time on family
>
>>> >> objects): that's strange!
>
>>> >
>
>>> > Yes, it would be handy to do this. This might be called "idempotent"
>
>>> > by a mathematician: if the round-trip through gramps was idempotent,
>
>>> > then the diff would be empty.
>
>>> >
>
>>> > What we need is:
>
>>> >
>
>>> > 1. something smarter than diff for this usage
>
>>> > 2. sort on something that doesn't change (like the handle), just for
>
>>> > this purpose
>
>>> > 3. make it so that the order is preserved
>
>>> >
>
>>> > I would lean towards #3. I've "fixed" some other places where the
>
>>> > order was lost. If you let me know which orders are lost, I'll
>
>>> > address.
>
>>> >
>
>>> > -Doug
>
>>> >
>
>>> >> Jérôme
>
>>> >>
>
>>> >>
>
>>> >> --- En date de : Ven 14.1.11, Gerald Britton
>>> >> <[hidden email]>
>
>>> >> a écrit :
>
>>> >>
>
>>> >>> De: Gerald Britton <[hidden email]>
>
>>> >>> Objet: Re: [Gramps-devel]
>
>>> >>> self.db.iter_object_handles(sort_handles=True)
>
>>> >>> À: "jerome" <[hidden email]>
>
>>> >>> Cc: [hidden email]
>
>>> >>> Date: Vendredi 14 janvier 2011, 22h10
>
>>> >>> On Fri, Jan 14, 2011 at 3:59 PM,
>
>>> >>> jerome <[hidden email]>
>
>>> >>> wrote:
>
>>> >>>>>> I am not certain to understand ...
>
>>> >>>>>> Keys should be handles, no ?
>
>>> >>>>> Well, that's the question!  I can see a case for
>
>>> >>>>> gramps ids, or
>
>>> >>>>> surnames, or event dates, etc. etc.
>
>>> >>>> But handle is the easiest way and safe key for
>
>>> >>> ordering our data.
>
>>> >>>
>
>>> >>> Only if that's the order you want
>
>>> >>>
>
>>> >>>> gramps ids could be exotic!
>
>>> >>> Do you mean unique?  Anyway it is a good sort-key
>
>>> >>> candidate
>
>>> >>>
>
>>> >>>> surnames is not a good key :(
>
>>> >>> I can see that some would like it...makes the XML easier to
>
>>> >>> read by a human
>
>>> >>>
>
>>> >>>> date => date_object => year, then month, then
>
>>> >>> day, then rank, etc ... = horrible index
>
>>> >>>
>
>>> >>> Probably, but its just one possibility
>
>>> >>>
>
>>> >>>> My problem is on plugins/export/ExportXML.py
>
>>> >>>>
>
>>> >>>> I saw a sortByID function not used, then sometimes the
>
>>> >>> use of list (get_...), then iteration (only family
>
>>> >>> handles).
>
>>> >>>> I thought on use lists sorted by handle for having an
>
>>> >>> order rule. I do not want to group handles, handles will be
>
>>> >>> grouped into the Gramps XML, so it was not planned to parse
>
>>> >>> one flat XML file or something like that!
>
>>> >>>> But it is not my main problem ...
>
>>> >>>> I thought that to sort handles means objects lists
>
>>> >>> will be consistent (Persons, Families, Events, etc ...)
>
>>> >>>> Every time I import a Gramps XML, Gramps rebuilds
>
>>> >>> (write, DB commit) some objects! Change time is not the same
>
>>> >>> with a simple import then export.
>
>>> >>>
>
>>> >>> Well, they all need new handles, right?  Possibility
>
>>> >>> of collisions.
>
>>> >>> Also with gramps ids.
>
>>> >>>
>
>>> >>>> I can understand the random order used by bsddb, but
>
>>> >>> this should not be done on some objects (like family) and
>
>>> >>> not on the others.
>
>>> >>>> In my mind, an import without DB change is like a
>
>>> >>> "read-only": it is not the case. OK, you are saying that it
>
>>> >>> is the way used by bsddb. XML files should be able to use
>
>>> >>> 'diff' or revision control tools. With current Gramps XML
>
>>> >>> import/export, these tools are limited. :(
>
>>> >>>
>
>>> >>> Yep.  You're probably looking for something like a
>
>>> >>> UUID for each
>
>>> >>> record.  Not a bad idea but not implemented at the
>
>>> >>> moment.
>
>>> >>>
>
>>> >>>>
>
>>> >>>> Jérôme
>
>>> >>>>
>
>>> >>>>
>
>>> >>>> --- En date de : Ven 14.1.11, Gerald Britton
>
>>> >>>> <[hidden email]>
>
>>> >>> a écrit :
>
>>> >>>>> De: Gerald Britton <[hidden email]>
>
>>> >>>>> Objet: Re: [Gramps-devel]
>
>>> >>> self.db.iter_object_handles(sort_handles=True)
>
>>> >>>>> À: "jerome" <[hidden email]>
>
>>> >>>>> Cc: [hidden email]
>
>>> >>>>> Date: Vendredi 14 janvier 2011, 21h21
>
>>> >>>>> On Fri, Jan 14, 2011 at 3:11 PM,
>
>>> >>>>> jerome <[hidden email]>
>
>>> >>>>> wrote:
>
>>> >>>>>> I am not certain to understand ...
>
>>> >>>>>> Keys should be handles, no ?
>
>>> >>>>> Well, that's the question!  I can see a case for
>
>>> >>>>> gramps ids, or
>
>>> >>>>> surnames, or event dates, etc. etc.
>
>>> >>>>>
>
>>> >>>>>>
>
>>> >>> 'self.db.get_{object}_handles(sort_handles=True)' is
>
>>> >>>>> allowed,
>
>>> >>>>>> not
>
>>> >>> 'self.db.iter_{object}_handles(sort_handles=True)'!
>
>>> >>>>>> There is two questions:
>
>>> >>>>>>
>
>>> >>>>>> 1. Why does Gramps only use
>
>>> >>>>> self.db.iter_family_handles(), else
>
>>> >>>>> self.get_{object}_handles(), where {object} is
>
>>> >>> person or
>
>>> >>>>> event or source or place or repository or note or
>
>>> >>> media
>
>>> >>>>> object.
>
>>> >>>>>
>
>>> >>>>> the get_...handles methods return a list, which
>
>>> >>> can be
>
>>> >>>>> expensive in
>
>>> >>>>> memory and must read all objects in one pass.
>
>>> >>> The
>
>>> >>>>> iter... methods
>
>>> >>>>> just return one at at time, so are cheaper in
>
>>> >>> memory.
>
>>> >>>>> So, the iter...
>
>>> >>>>> methods are preferable.  OTOH, they cannot do
>
>>> >>> sorting,
>
>>> >>>>> since by
>
>>> >>>>> definition you need to read all records before you
>
>>> >>> can sort
>
>>> >>>>> them.
>
>>> >>>>>
>
>>> >>>>>> 2. Why 'sort_handles=True' argument is
>
>>> >>> allowed on all
>
>>> >>>>> primary objects except family object ?
>
>>> >>>>>
>
>>> >>>>> I suppose that there has been no requirement so
>
>>> >>> far so no
>
>>> >>>>> one coded it up.
>
>>> >>>>>
>
>>> >>>>>>> The data is not ordered since it
>
>>> >>>>>>> comes from bsddb in random order.
>
>>> >>>>>> This could explain why I will not be able to
>
>>> >>> keep
>
>>> >>>>> order on XML import (to bsddb). :(
>
>>> >>>>>>
>
>>> >>>>>> Thanks.
>
>>> >>>>>> Jérôme
>
>>> >>>>>>
>
>>> >>>>>> --- En date de : Ven 14.1.11, Gerald Britton
>
>>> >>> <[hidden email]>
>
>>> >>>>> a écrit :
>
>>> >>>>>>> De: Gerald Britton <[hidden email]>
>
>>> >>>>>>> Objet: Re: [Gramps-devel]
>
>>> >>>>> self.db.iter_object_handles(sort_handles=True)
>
>>> >>>>>>> À: "jerome" <[hidden email]>
>
>>> >>>>>>> Cc: [hidden email]
>
>>> >>>>>>> Date: Vendredi 14 janvier 2011, 19h53
>
>>> >>>>>>> The data is not ordered since it
>
>>> >>>>>>> comes from bsddb in random order.  If
>
>>> >>>>>>> we ordered it, we would have to sort it
>
>>> >>> by some
>
>>> >>>>> key.
>
>>> >>>>>>> So, if we did,
>
>>> >>>>>>> what keys would you use for:
>
>>> >>>>>>>
>
>>> >>>>>>> person
>
>>> >>>>>>> family
>
>>> >>>>>>> event
>
>>> >>>>>>> source
>
>>> >>>>>>> place
>
>>> >>>>>>> repository
>
>>> >>>>>>> note
>
>>> >>>>>>> media object
>
>>> >>>>>>>
>
>>> >>>>>>> On Fri, Jan 14, 2011 at 1:36 PM, jerome
>
>>> >>> <[hidden email]>
>
>>> >>>>>>> wrote:
>
>>> >>>>>>>> Hi,
>
>>> >>>>>>>>
>
>>> >>>>>>>>
>
>>> >>>>>>>> I am trying to get an answer to a
>
>>> >>> question
>
>>> >>>>> about the
>
>>> >>>>>>> code: why we cannot keep the order of
>
>>> >>> objects
>
>>> >>>>> after a Gramps
>
>>> >>>>>>> XML file import against export ?
>
>>> >>>>>>>> Nick pointed out that objects are
>
>>> >>> not ordered
>
>>> >>>>> on
>
>>> >>>>>>> export[1].
>
>>> >>>>>>>> Why ? I suppose backup scripts or
>
>>> >>> revision
>
>>> >>>>> control
>
>>> >>>>>>> tools will work better with ordered
>
>>> >>> objects!
>
>>> >>>>> Anyway, to use
>
>>> >>>>>>> 'sort_handles=True' works on export,
>
>>> >>> except for
>
>>> >>>>> family
>
>>> >>>>>>> handles. Any reason for that ? A typo
>
>>> >>> somewhere ?
>
>>> >>>>> On my side
>
>>> >>>>>>> ?
>
>>> >>>>>>>>
>
>>> >>>>>>>> [1] http://www.gramps-project.org/bugs/view.php?id=4365
>
>>> >>>>>>>>
>
>>> >>>>>>>> regards,
>
>>> >>>>>>>> Jérôme
>
>>> >>>>>>>>
>
>>> >>>>>>>>
>
>>> >>>>>>>>
>
>>> >>>>>>>>
>
>>> >>>>>>>>
>
>>> >>>
>
>>> >>> ------------------------------------------------------------------------------
>
>>> >>>>>>>> Protect Your Site and Customers from
>
>>> >>> Malware
>
>>> >>>>> Attacks
>
>>> >>>>>>>> Learn about various malware tactics
>
>>> >>> and how
>
>>> >>>>> to avoid
>
>>> >>>>>>> them. Understand
>
>>> >>>>>>>> malware threats, the impact they can
>
>>> >>> have on
>
>>> >>>>> your
>
>>> >>>>>>> business, and how you
>
>>> >>>>>>>> can protect your company and
>
>>> >>> customers by
>
>>> >>>>> using code
>
>>> >>>>>>> signing.
>
>>> >>>>>>>> http://p.sf.net/sfu/oracle-sfdevnl
>
>>> >>>>>>>>
>
>>> >>>>> _______________________________________________
>
>>> >>>>>>>> Gramps-devel mailing list
>
>>> >>>>>>>> [hidden email]
>
>>> >>>>>>>> https://lists.sourceforge.net/lists/listinfo/gramps-devel
>
>>> >>>>>>>>
>
>>> >>>>>>>
>
>>> >>>>>>>
>
>>> >>>>>>> --
>
>>> >>>>>>> Gerald Britton
>
>>> >>>>>>>
>
>>> >>>>>>
>
>>> >>>>>>
>
>>> >>>>>>
>
>>> >>>>>
>
>>> >>>>>
>
>>> >>>>> --
>
>>> >>>>> Gerald Britton
>
>>> >>>>>
>
>>> >>>>
>
>>> >>>>
>
>>> >>>>
>
>>> >>>
>
>>> >>>
>
>>> >>> --
>
>>> >>> Gerald Britton
>
>>> >>>
>
>>> >>
>
>>> >>
>
>>> >>
>
>>> >>
>
>>> >> ------------------------------------------------------------------------------
>
>>> >> Protect Your Site and Customers from Malware Attacks
>
>>> >> Learn about various malware tactics and how to avoid them. Understand
>
>>> >> malware threats, the impact they can have on your business, and how
>>> >> you
>
>>> >> can protect your company and customers by using code signing.
>
>>> >> http://p.sf.net/sfu/oracle-sfdevnl
>
>>> >> _______________________________________________
>
>>> >> Gramps-devel mailing list
>
>>> >> [hidden email]
>
>>> >> https://lists.sourceforge.net/lists/listinfo/gramps-devel
>
>>> >>
>
>>> >
>
>>>
>
>>>
>
>>>
>
>>> ------------------------------------------------------------------------------
>
>>> Protect Your Site and Customers from Malware Attacks
>
>>> Learn about various malware tactics and how to avoid them. Understand
>
>>> malware threats, the impact they can have on your business, and how you
>
>>> can protect your company and customers by using code signing.
>
>>> http://p.sf.net/sfu/oracle-sfdevnl
>
>>> _______________________________________________
>
>>> Gramps-devel mailing list
>
>>> [hidden email]
>
>>> https://lists.sourceforge.net/lists/listinfo/gramps-devel
>
>>
>
>>
>
>
>
>
>
>
>
>

--
Sent from my mobile device

Gerald Britton

------------------------------------------------------------------------------
Protect Your Site and Customers from Malware Attacks
Learn about various malware tactics and how to avoid them. Understand
malware threats, the impact they can have on your business, and how you
can protect your company and customers by using code signing.
http://p.sf.net/sfu/oracle-sfdevnl
_______________________________________________
Gramps-devel mailing list
[hidden email]
https://lists.sourceforge.net/lists/listinfo/gramps-devel
Reply | Threaded
Open this post in threaded view
|

Re: self.db.iter_object_handles(sort_handles=True)

jerome
In reply to this post by jerome
>> 3. make it so that the order is preserved
>>
>> I would lean towards #3. I've "fixed" some other places
>> where the order was lost. If you let me know which orders are lost, I'll address.
>
> At glance, I will say events, notes, places.

Sorry, only on events, family and few persons, see diff.txt on:
http://www.gramps-project.org/bugs/view.php?id=4365
maybe places and notes on my primary database?

Is it possible that there is a cache on import and if there is a lot of
handles, then cache is commited like a direct write (do not keep order
on import ?). This could explain that long lists of objects could be
up-side-down! And only on large database or group of objects.


Jérôme


jerome a écrit :

>> Yes, it would be handy to do this. This might be called
>> "idempotent"
>> by a mathematician: if the round-trip through gramps was
>> idempotent,
>> then the diff would be empty.
>
> That's exactly what I tried to do.
> I learned one word! :)
> Thanks!
>
>> 3. make it so that the order is preserved
>>
>> I would lean towards #3. I've "fixed" some other places
>> where the order was lost. If you let me know which orders are lost, I'll address.
>
> At glance, I will say events, notes, places.
>
> But there is something else:
> 1. some families are re-written (change time)
> 2. small samples do not reorder ! cache limit ?
>
> http://www.gramps-project.org/bugs/view.php?id=4365
>
>
> Jérôme
>
>
> --- En date de : Ven 14.1.11, Doug Blank <[hidden email]> a écrit :
>
>> De: Doug Blank <[hidden email]>
>> Objet: Re: [Gramps-devel] self.db.iter_object_handles(sort_handles=True)
>> À: "jerome" <[hidden email]>
>> Cc: "Gerald Britton" <[hidden email]>, [hidden email]
>> Date: Vendredi 14 janvier 2011, 22h57
>> On Fri, Jan 14, 2011 at 4:31 PM,
>> jerome <[hidden email]>
>> wrote:
>>>>> gramps ids could be exotic!
>>>> Do you mean unique?  Anyway it is a good
>> sort-key
>>>> candidate
>>> ids = [I000001, IAYUTRE235, zharb, /empty/ , etc ...]
>>>
>>> In 'handle' I trust! ;)
>>>
>>>>> Every time I import a Gramps XML, Gramps
>> rebuilds
>>>> (write, DB commit) some objects! Change time is
>> not the same
>>>> with a simple import then export.
>>>> Well, they all need new handles, right?
>>  Possibility
>>>> of collisions.
>>>> Also with gramps ids.
>>> In fact, I want to keep handles: they should be the
>> keys control.
>>> My problem could be illustrated by something like:
>>>
>>> $ gramps -i import.gramps -e export.gramps
>>> $ gunzip < import.gramps > import.xml
>>> $ gunzip < export.gramps > export.xml
>>> $ diff -u import.xml export.xml > diff.txt
>>>
>>> where import.gramps is our "Scientific control".
>>>
>>> What should be the content of diff.txt ?
>>>
>>> For me, it should be few lines...
>>> Unfortunatly there is some change (order, change time
>> on family objects): that's strange!
>>
>> Yes, it would be handy to do this. This might be called
>> "idempotent"
>> by a mathematician: if the round-trip through gramps was
>> idempotent,
>> then the diff would be empty.
>>
>> What we need is:
>>
>> 1. something smarter than diff for this usage
>> 2. sort on something that doesn't change (like the handle),
>> just for
>> this purpose
>> 3. make it so that the order is preserved
>>
>> I would lean towards #3. I've "fixed" some other places
>> where the
>> order was lost. If you let me know which orders are lost,
>> I'll
>> address.
>>
>> -Doug
>>
>>> Jérôme
>>>
>>>
>>> --- En date de : Ven 14.1.11, Gerald Britton <[hidden email]>
>> a écrit :
>>>> De: Gerald Britton <[hidden email]>
>>>> Objet: Re: [Gramps-devel]
>> self.db.iter_object_handles(sort_handles=True)
>>>> À: "jerome" <[hidden email]>
>>>> Cc: [hidden email]
>>>> Date: Vendredi 14 janvier 2011, 22h10
>>>> On Fri, Jan 14, 2011 at 3:59 PM,
>>>> jerome <[hidden email]>
>>>> wrote:
>>>>>>> I am not certain to understand ...
>>>>>>> Keys should be handles, no ?
>>>>>> Well, that's the question!  I can see a
>> case for
>>>>>> gramps ids, or
>>>>>> surnames, or event dates, etc. etc.
>>>>> But handle is the easiest way and safe key
>> for
>>>> ordering our data.
>>>>
>>>> Only if that's the order you want
>>>>
>>>>> gramps ids could be exotic!
>>>> Do you mean unique?  Anyway it is a good
>> sort-key
>>>> candidate
>>>>
>>>>> surnames is not a good key :(
>>>> I can see that some would like it...makes the XML
>> easier to
>>>> read by a human
>>>>
>>>>> date => date_object => year, then
>> month, then
>>>> day, then rank, etc ... = horrible index
>>>>
>>>> Probably, but its just one possibility
>>>>
>>>>> My problem is on plugins/export/ExportXML.py
>>>>>
>>>>> I saw a sortByID function not used, then
>> sometimes the
>>>> use of list (get_...), then iteration (only
>> family
>>>> handles).
>>>>> I thought on use lists sorted by handle for
>> having an
>>>> order rule. I do not want to group handles,
>> handles will be
>>>> grouped into the Gramps XML, so it was not planned
>> to parse
>>>> one flat XML file or something like that!
>>>>> But it is not my main problem ...
>>>>> I thought that to sort handles means objects
>> lists
>>>> will be consistent (Persons, Families, Events, etc
>> ...)
>>>>> Every time I import a Gramps XML, Gramps
>> rebuilds
>>>> (write, DB commit) some objects! Change time is
>> not the same
>>>> with a simple import then export.
>>>>
>>>> Well, they all need new handles, right?
>> Possibility
>>>> of collisions.
>>>> Also with gramps ids.
>>>>
>>>>> I can understand the random order used by
>> bsddb, but
>>>> this should not be done on some objects (like
>> family) and
>>>> not on the others.
>>>>> In my mind, an import without DB change is
>> like a
>>>> "read-only": it is not the case. OK, you are
>> saying that it
>>>> is the way used by bsddb. XML files should be able
>> to use
>>>> 'diff' or revision control tools. With current
>> Gramps XML
>>>> import/export, these tools are limited. :(
>>>>
>>>> Yep.  You're probably looking for something like
>> a
>>>> UUID for each
>>>> record.  Not a bad idea but not implemented at
>> the
>>>> moment.
>>>>
>>>>>
>>>>> Jérôme
>>>>>
>>>>>
>>>>> --- En date de : Ven 14.1.11, Gerald Britton
>> <[hidden email]>
>>>> a écrit :
>>>>>> De: Gerald Britton <[hidden email]>
>>>>>> Objet: Re: [Gramps-devel]
>>>> self.db.iter_object_handles(sort_handles=True)
>>>>>> À: "jerome" <[hidden email]>
>>>>>> Cc: [hidden email]
>>>>>> Date: Vendredi 14 janvier 2011, 21h21
>>>>>> On Fri, Jan 14, 2011 at 3:11 PM,
>>>>>> jerome <[hidden email]>
>>>>>> wrote:
>>>>>>> I am not certain to understand ...
>>>>>>> Keys should be handles, no ?
>>>>>> Well, that's the question!  I can see a
>> case for
>>>>>> gramps ids, or
>>>>>> surnames, or event dates, etc. etc.
>>>>>>
>>>>>>>
>>>> 'self.db.get_{object}_handles(sort_handles=True)'
>> is
>>>>>> allowed,
>>>>>>> not
>> 'self.db.iter_{object}_handles(sort_handles=True)'!
>>>>>>> There is two questions:
>>>>>>>
>>>>>>> 1. Why does Gramps only use
>>>>>> self.db.iter_family_handles(), else
>>>>>> self.get_{object}_handles(), where
>> {object} is
>>>> person or
>>>>>> event or source or place or repository or
>> note or
>>>> media
>>>>>> object.
>>>>>>
>>>>>> the get_...handles methods return a list,
>> which
>>>> can be
>>>>>> expensive in
>>>>>> memory and must read all objects in one
>> pass.
>>>> The
>>>>>> iter... methods
>>>>>> just return one at at time, so are
>> cheaper in
>>>> memory.
>>>>>> So, the iter...
>>>>>> methods are preferable.  OTOH, they
>> cannot do
>>>> sorting,
>>>>>> since by
>>>>>> definition you need to read all records
>> before you
>>>> can sort
>>>>>> them.
>>>>>>
>>>>>>> 2. Why 'sort_handles=True' argument
>> is
>>>> allowed on all
>>>>>> primary objects except family object ?
>>>>>>
>>>>>> I suppose that there has been no
>> requirement so
>>>> far so no
>>>>>> one coded it up.
>>>>>>
>>>>>>>> The data is not ordered since
>> it
>>>>>>>> comes from bsddb in random
>> order.
>>>>>>> This could explain why I will not be
>> able to
>>>> keep
>>>>>> order on XML import (to bsddb). :(
>>>>>>>
>>>>>>> Thanks.
>>>>>>> Jérôme
>>>>>>>
>>>>>>> --- En date de : Ven 14.1.11,
>> Gerald Britton
>>>> <[hidden email]>
>>>>>> a écrit :
>>>>>>>> De: Gerald Britton <[hidden email]>
>>>>>>>> Objet: Re: [Gramps-devel]
>> self.db.iter_object_handles(sort_handles=True)
>>>>>>>> À: "jerome" <[hidden email]>
>>>>>>>> Cc: [hidden email]
>>>>>>>> Date: Vendredi 14 janvier 2011,
>> 19h53
>>>>>>>> The data is not ordered since
>> it
>>>>>>>> comes from bsddb in random
>> order.  If
>>>>>>>> we ordered it, we would have to
>> sort it
>>>> by some
>>>>>> key.
>>>>>>>> So, if we did,
>>>>>>>> what keys would you use for:
>>>>>>>>
>>>>>>>> person
>>>>>>>> family
>>>>>>>> event
>>>>>>>> source
>>>>>>>> place
>>>>>>>> repository
>>>>>>>> note
>>>>>>>> media object
>>>>>>>>
>>>>>>>> On Fri, Jan 14, 2011 at 1:36 PM,
>> jerome
>>>> <[hidden email]>
>>>>>>>> wrote:
>>>>>>>>> Hi,
>>>>>>>>>
>>>>>>>>>
>>>>>>>>> I am trying to get an
>> answer to a
>>>> question
>>>>>> about the
>>>>>>>> code: why we cannot keep the
>> order of
>>>> objects
>>>>>> after a Gramps
>>>>>>>> XML file import against export
>> ?
>>>>>>>>> Nick pointed out that
>> objects are
>>>> not ordered
>>>>>> on
>>>>>>>> export[1].
>>>>>>>>> Why ? I suppose backup
>> scripts or
>>>> revision
>>>>>> control
>>>>>>>> tools will work better with
>> ordered
>>>> objects!
>>>>>> Anyway, to use
>>>>>>>> 'sort_handles=True' works on
>> export,
>>>> except for
>>>>>> family
>>>>>>>> handles. Any reason for that ? A
>> typo
>>>> somewhere ?
>>>>>> On my side
>>>>>>>> ?
>>>>>>>>>
>>>>>>>>> [1] http://www.gramps-project.org/bugs/view.php?id=4365
>>>>>>>>>
>>>>>>>>> regards,
>>>>>>>>> Jérôme
>>>>>>>>>
>>>>>>>>>
>>>>>>>>>
>>>>>>>>>
>>>>>>>>>
>> ------------------------------------------------------------------------------
>>>>>>>>> Protect Your Site and
>> Customers from
>>>> Malware
>>>>>> Attacks
>>>>>>>>> Learn about various malware
>> tactics
>>>> and how
>>>>>> to avoid
>>>>>>>> them. Understand
>>>>>>>>> malware threats, the impact
>> they can
>>>> have on
>>>>>> your
>>>>>>>> business, and how you
>>>>>>>>> can protect your company
>> and
>>>> customers by
>>>>>> using code
>>>>>>>> signing.
>>>>>>>>> http://p.sf.net/sfu/oracle-sfdevnl
>>>>>>>>>
>> _______________________________________________
>>>>>>>>> Gramps-devel mailing list
>>>>>>>>> [hidden email]
>>>>>>>>> https://lists.sourceforge.net/lists/listinfo/gramps-devel
>>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>> --
>>>>>>>> Gerald Britton
>>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>
>>>>>>
>>>>>> --
>>>>>> Gerald Britton
>>>>>>
>>>>>
>>>>>
>>>>>
>>>>
>>>>
>>>> --
>>>> Gerald Britton
>>>>
>>>
>>>
>>>
>>>
>> ------------------------------------------------------------------------------
>>> Protect Your Site and Customers from Malware Attacks
>>> Learn about various malware tactics and how to avoid
>> them. Understand
>>> malware threats, the impact they can have on your
>> business, and how you
>>> can protect your company and customers by using code
>> signing.
>>> http://p.sf.net/sfu/oracle-sfdevnl
>>> _______________________________________________
>>> Gramps-devel mailing list
>>> [hidden email]
>>> https://lists.sourceforge.net/lists/listinfo/gramps-devel
>>>
>
>
>      
>
> ------------------------------------------------------------------------------
> Protect Your Site and Customers from Malware Attacks
> Learn about various malware tactics and how to avoid them. Understand
> malware threats, the impact they can have on your business, and how you
> can protect your company and customers by using code signing.
> http://p.sf.net/sfu/oracle-sfdevnl
> _______________________________________________
> Gramps-devel mailing list
> [hidden email]
> https://lists.sourceforge.net/lists/listinfo/gramps-devel
>


------------------------------------------------------------------------------
Protect Your Site and Customers from Malware Attacks
Learn about various malware tactics and how to avoid them. Understand
malware threats, the impact they can have on your business, and how you
can protect your company and customers by using code signing.
http://p.sf.net/sfu/oracle-sfdevnl
_______________________________________________
Gramps-devel mailing list
[hidden email]
https://lists.sourceforge.net/lists/listinfo/gramps-devel
Reply | Threaded
Open this post in threaded view
|

Re: self.db.iter_object_handles(sort_handles=True)

jerome
In reply to this post by Gerald Britton-2
Gerald,


Is it not possible that bsddb hash collisions slow down import ?

I suppose I have a DB synchronization issue! [1]
Is there any fsync() like?
http://www.bash-linux.com/unix-man-fsync.html

Why the pseudo random keys (output) are not used on all tables or not
visible use (import/export) on some tables [2] ?


[1] http://www.gramps-project.org/bugs/view.php?id=4428
[2] http://www.gramps-project.org/bugs/view.php?id=4365


thanks,
Jérôme



Gerald Britton a écrit :

> The handle key is a bsddb hash thus stored randomly (pseudo randomly).
> So export, import, export will not likely produce two identical
> exports.
>
> Otoh. You could run the xml file through xml query and use the "order"
> clause to sort it.
>
> On 1/15/11, jerome <[hidden email]> wrote:
>> Sorry, it was not a request for sorting handle or use of index.
>> It is was just the only one way to get Gramps more idempotent with the
>> Gramps XML file format, which is not the bsddb.
>>
>> True, the title is about iteration an sort argument but it was an
>> illustration related to ExportXml.py module.
>>
>> ImportXml.py (or something else) is rewritting some family objects and
>> re-order events, notes, places, etc ... after a simple Gramps XML
>> round-trip. This cannot affect bsddb performance, right ?
>>
>> To get a list or to iterate was only one method to try to get Gramps more
>> idempotent after a simple Gramps XML round-trip without any DB change done
>> by the user.
>>
>> $ gramps -i import.gramps -e export.gramps
>>
>> As user, I thought "import.gramps = export.gramps"
>>
>> If it is related to bsddb performances, then this makes me thinking that
>> Gramps should not use this XML parser!!!
>> As Gramps uses ImportXml.py, something might be wrong here ...
>> If to write XML data to bsddb is not able to keep the order, then something
>> is strange because person handles seem to keep the order!
>>
>>
>>
>> PS: there is also a performance issues on Gramps XML import (slow down :
>> python 2.6/7, bsddb ?)
>>
>>
>> Jérôme
>>
>>
>> --- En date de : Sam 15.1.11, Benny Malengier <[hidden email]> a
>> écrit :
>>
>> De: Benny Malengier <[hidden email]>
>> Objet: Re: [Gramps-devel] self.db.iter_object_handles(sort_handles=True)
>> À: "Doug Blank" <[hidden email]>
>> Cc: [hidden email], [hidden email], "Gerald Britton"
>> <[hidden email]>
>> Date: Samedi 15 janvier 2011, 16h13
>>
>>
>>
>> 2011/1/15 Doug Blank <[hidden email]>
>>
>> On Sat, Jan 15, 2011 at 8:44 AM, Benny Malengier
>>
>> <[hidden email]> wrote:
>>
>>> We should _never_ order on export.
>>> We should only access things via an index in the database.
>>
>>
>> Benny,
>>
>>
>>
>> If I understand what you mean, you mean don't sort export by something
>>
>> *other* than an index. As long as we have an index to sort by, then we
>>
>> are fine, right? Or did you mean something else?
>>
>>
>>
>> For 300000 people, not following the index would be best, as then you don't
>> hit a database page twice.
>> When you follow an index, you will jump over your database pages, and the
>> data is too large to stay in memory.
>>
>> It depends on bsddb structure if following sorted record key has this effect
>> or not.
>>
>> So, index is good, but not as good as just reading the database table out.
>> It depends on how much performance you want.
>> For Gramps as a desktop applicatoin I can accept following an index is good
>> enough, even if not the best.
>>
>>
>> Benny
>>
>>
>> -Doug
>>
>>
>>
>>> Ordering would mean a huge time penalty on exporting for those with very
>>> large family trees.
>>> Even exporting along a bsddb index would be much slower, as now we go from
>>> database page to database page.
>>> Just looping over the data and exporting means the the harddisk is the
>>> least
>>> read (it goes from database page to database page).
>>> In other words:
>>> 1/ default should be just a cursor of the database table, so order cannot
>>> be
>>> maintained
>>> 2/ ordered output could be optional. If we add an ordered output, it
>>> should
>>> be along an index page of the database, so no in memory sorting must occur
>>> before export can be done. I think ID has a sorted index over it. Handle
>>> normally also, as it is the primary key, and will hence be in some sort of
>>> B-tree. You must be sure to use the sort index on looping however.
>>> Benny
>>> 2011/1/15 Jérôme <[hidden email]>
>>>>> if the round-trip through gramps was idempotent, then the diff would be
>>>>> empty.
>>>> Expected result was: minor change on date generation (if generated on an
>>>> other day) and maybe media objects (media paths).
>>>> I do not expect a full idem potent after round-trip, but currently we
>>>> cannot easily get the differences. I just wanted testing complete XML
>>>> migration before major release.
>>>> Jérôme
>>>> Doug Blank a écrit :
>>>>> On Fri, Jan 14, 2011 at 4:31 PM, jerome <[hidden email]> wrote:
>>>>>>>> gramps ids could be exotic!
>>>>>>> Do you mean unique?  Anyway it is a good sort-key
>>>>>>> candidate
>>>>>> ids = [I000001, IAYUTRE235, zharb, /empty/ , etc ...]
>>>>>> In 'handle' I trust! ;)
>>>>>>>> Every time I import a Gramps XML, Gramps rebuilds
>>>>>>> (write, DB commit) some objects! Change time is not the same
>>>>>>> with a simple import then export.
>>>>>>> Well, they all need new handles, right?  Possibility
>>>>>>> of collisions.
>>>>>>> Also with gramps ids.
>>>>>> In fact, I want to keep handles: they should be the keys control.
>>>>>> My problem could be illustrated by something like:
>>>>>> $ gramps -i import.gramps -e export.gramps
>>>>>> $ gunzip < import.gramps > import.xml
>>>>>> $ gunzip < export.gramps > export.xml
>>>>>> $ diff -u import.xml export.xml > diff.txt
>>>>>> where import.gramps is our "Scientific control".
>>>>>> What should be the content of diff.txt ?
>>>>>> For me, it should be few lines...
>>>>>> Unfortunatly there is some change (order, change time on family
>>>>>> objects): that's strange!
>>>>> Yes, it would be handy to do this. This might be called "idempotent"
>>>>> by a mathematician: if the round-trip through gramps was idempotent,
>>>>> then the diff would be empty.
>>>>> What we need is:
>>>>> 1. something smarter than diff for this usage
>>>>> 2. sort on something that doesn't change (like the handle), just for
>>>>> this purpose
>>>>> 3. make it so that the order is preserved
>>>>> I would lean towards #3. I've "fixed" some other places where the
>>>>> order was lost. If you let me know which orders are lost, I'll
>>>>> address.
>>>>> -Doug
>>>>>> Jérôme
>>>>>> --- En date de : Ven 14.1.11, Gerald Britton
>>>>>> <[hidden email]>
>>>>>> a écrit :
>>>>>>> De: Gerald Britton <[hidden email]>
>>>>>>> Objet: Re: [Gramps-devel]
>>>>>>> self.db.iter_object_handles(sort_handles=True)
>>>>>>> À: "jerome" <[hidden email]>
>>>>>>> Cc: [hidden email]
>>>>>>> Date: Vendredi 14 janvier 2011, 22h10
>>>>>>> On Fri, Jan 14, 2011 at 3:59 PM,
>>>>>>> jerome <[hidden email]>
>>>>>>> wrote:
>>>>>>>>>> I am not certain to understand ...
>>>>>>>>>> Keys should be handles, no ?
>>>>>>>>> Well, that's the question!  I can see a case for
>>>>>>>>> gramps ids, or
>>>>>>>>> surnames, or event dates, etc. etc.
>>>>>>>> But handle is the easiest way and safe key for
>>>>>>> ordering our data.
>>>>>>> Only if that's the order you want
>>>>>>>> gramps ids could be exotic!
>>>>>>> Do you mean unique?  Anyway it is a good sort-key
>>>>>>> candidate
>>>>>>>> surnames is not a good key :(
>>>>>>> I can see that some would like it...makes the XML easier to
>>>>>>> read by a human
>>>>>>>> date => date_object => year, then month, then
>>>>>>> day, then rank, etc ... = horrible index
>>>>>>> Probably, but its just one possibility
>>>>>>>> My problem is on plugins/export/ExportXML.py
>>>>>>>> I saw a sortByID function not used, then sometimes the
>>>>>>> use of list (get_...), then iteration (only family
>>>>>>> handles).
>>>>>>>> I thought on use lists sorted by handle for having an
>>>>>>> order rule. I do not want to group handles, handles will be
>>>>>>> grouped into the Gramps XML, so it was not planned to parse
>>>>>>> one flat XML file or something like that!
>>>>>>>> But it is not my main problem ...
>>>>>>>> I thought that to sort handles means objects lists
>>>>>>> will be consistent (Persons, Families, Events, etc ...)
>>>>>>>> Every time I import a Gramps XML, Gramps rebuilds
>>>>>>> (write, DB commit) some objects! Change time is not the same
>>>>>>> with a simple import then export.
>>>>>>> Well, they all need new handles, right?  Possibility
>>>>>>> of collisions.
>>>>>>> Also with gramps ids.
>>>>>>>> I can understand the random order used by bsddb, but
>>>>>>> this should not be done on some objects (like family) and
>>>>>>> not on the others.
>>>>>>>> In my mind, an import without DB change is like a
>>>>>>> "read-only": it is not the case. OK, you are saying that it
>>>>>>> is the way used by bsddb. XML files should be able to use
>>>>>>> 'diff' or revision control tools. With current Gramps XML
>>>>>>> import/export, these tools are limited. :(
>>>>>>> Yep.  You're probably looking for something like a
>>>>>>> UUID for each
>>>>>>> record.  Not a bad idea but not implemented at the
>>>>>>> moment.
>>>>>>>> Jérôme
>>>>>>>> --- En date de : Ven 14.1.11, Gerald Britton
>>>>>>>> <[hidden email]>
>>>>>>> a écrit :
>>>>>>>>> De: Gerald Britton <[hidden email]>
>>>>>>>>> Objet: Re: [Gramps-devel]
>>>>>>> self.db.iter_object_handles(sort_handles=True)
>>>>>>>>> À: "jerome" <[hidden email]>
>>>>>>>>> Cc: [hidden email]
>>>>>>>>> Date: Vendredi 14 janvier 2011, 21h21
>>>>>>>>> On Fri, Jan 14, 2011 at 3:11 PM,
>>>>>>>>> jerome <[hidden email]>
>>>>>>>>> wrote:
>>>>>>>>>> I am not certain to understand ...
>>>>>>>>>> Keys should be handles, no ?
>>>>>>>>> Well, that's the question!  I can see a case for
>>>>>>>>> gramps ids, or
>>>>>>>>> surnames, or event dates, etc. etc.
>>>>>>> 'self.db.get_{object}_handles(sort_handles=True)' is
>>>>>>>>> allowed,
>>>>>>>>>> not
>>>>>>> 'self.db.iter_{object}_handles(sort_handles=True)'!
>>>>>>>>>> There is two questions:
>>>>>>>>>> 1. Why does Gramps only use
>>>>>>>>> self.db.iter_family_handles(), else
>>>>>>>>> self.get_{object}_handles(), where {object} is
>>>>>>> person or
>>>>>>>>> event or source or place or repository or note or
>>>>>>> media
>>>>>>>>> object.
>>>>>>>>> the get_...handles methods return a list, which
>>>>>>> can be
>>>>>>>>> expensive in
>>>>>>>>> memory and must read all objects in one pass.
>>>>>>> The
>>>>>>>>> iter... methods
>>>>>>>>> just return one at at time, so are cheaper in
>>>>>>> memory.
>>>>>>>>> So, the iter...
>>>>>>>>> methods are preferable.  OTOH, they cannot do
>>>>>>> sorting,
>>>>>>>>> since by
>>>>>>>>> definition you need to read all records before you
>>>>>>> can sort
>>>>>>>>> them.
>>>>>>>>>> 2. Why 'sort_handles=True' argument is
>>>>>>> allowed on all
>>>>>>>>> primary objects except family object ?
>>>>>>>>> I suppose that there has been no requirement so
>>>>>>> far so no
>>>>>>>>> one coded it up.
>>>>>>>>>>> The data is not ordered since it
>>>>>>>>>>> comes from bsddb in random order.
>>>>>>>>>> This could explain why I will not be able to
>>>>>>> keep
>>>>>>>>> order on XML import (to bsddb). :(
>>>>>>>>>> Thanks.
>>>>>>>>>> Jérôme
>>>>>>>>>> --- En date de : Ven 14.1.11, Gerald Britton
>>>>>>> <[hidden email]>
>>>>>>>>> a écrit :
>>>>>>>>>>> De: Gerald Britton <[hidden email]>
>>>>>>>>>>> Objet: Re: [Gramps-devel]
>>>>>>>>> self.db.iter_object_handles(sort_handles=True)
>>>>>>>>>>> À: "jerome" <[hidden email]>
>>>>>>>>>>> Cc: [hidden email]
>>>>>>>>>>> Date: Vendredi 14 janvier 2011, 19h53
>>>>>>>>>>> The data is not ordered since it
>>>>>>>>>>> comes from bsddb in random order.  If
>>>>>>>>>>> we ordered it, we would have to sort it
>>>>>>> by some
>>>>>>>>> key.
>>>>>>>>>>> So, if we did,
>>>>>>>>>>> what keys would you use for:
>>>>>>>>>>> person
>>>>>>>>>>> family
>>>>>>>>>>> event
>>>>>>>>>>> source
>>>>>>>>>>> place
>>>>>>>>>>> repository
>>>>>>>>>>> note
>>>>>>>>>>> media object
>>>>>>>>>>> On Fri, Jan 14, 2011 at 1:36 PM, jerome
>>>>>>> <[hidden email]>
>>>>>>>>>>> wrote:
>>>>>>>>>>>> Hi,
>>>>>>>>>>>> I am trying to get an answer to a
>>>>>>> question
>>>>>>>>> about the
>>>>>>>>>>> code: why we cannot keep the order of
>>>>>>> objects
>>>>>>>>> after a Gramps
>>>>>>>>>>> XML file import against export ?
>>>>>>>>>>>> Nick pointed out that objects are
>>>>>>> not ordered
>>>>>>>>> on
>>>>>>>>>>> export[1].
>>>>>>>>>>>> Why ? I suppose backup scripts or
>>>>>>> revision
>>>>>>>>> control
>>>>>>>>>>> tools will work better with ordered
>>>>>>> objects!
>>>>>>>>> Anyway, to use
>>>>>>>>>>> 'sort_handles=True' works on export,
>>>>>>> except for
>>>>>>>>> family
>>>>>>>>>>> handles. Any reason for that ? A typo
>>>>>>> somewhere ?
>>>>>>>>> On my side
>>>>>>>>>>> ?
>>>>>>>>>>>> [1] http://www.gramps-project.org/bugs/view.php?id=4365
>>>>>>>>>>>> regards,
>>>>>>>>>>>> Jérôme
>>>>>>> ------------------------------------------------------------------------------
>>>>>>>>>>>> Protect Your Site and Customers from
>>>>>>> Malware
>>>>>>>>> Attacks
>>>>>>>>>>>> Learn about various malware tactics
>>>>>>> and how
>>>>>>>>> to avoid
>>>>>>>>>>> them. Understand
>>>>>>>>>>>> malware threats, the impact they can
>>>>>>> have on
>>>>>>>>> your
>>>>>>>>>>> business, and how you
>>>>>>>>>>>> can protect your company and
>>>>>>> customers by
>>>>>>>>> using code
>>>>>>>>>>> signing.
>>>>>>>>>>>> http://p.sf.net/sfu/oracle-sfdevnl
>>>>>>>>> _______________________________________________
>>>>>>>>>>>> Gramps-devel mailing list
>>>>>>>>>>>> [hidden email]
>>>>>>>>>>>> https://lists.sourceforge.net/lists/listinfo/gramps-devel
>>>>>>>>>>> --
>>>>>>>>>>> Gerald Britton
>>>>>>>>> --
>>>>>>>>> Gerald Britton
>>>>>>> --
>>>>>>> Gerald Britton
>>>>>> ------------------------------------------------------------------------------
>>>>>> Protect Your Site and Customers from Malware Attacks
>>>>>> Learn about various malware tactics and how to avoid them. Understand
>>>>>> malware threats, the impact they can have on your business, and how
>>>>>> you
>>>>>> can protect your company and customers by using code signing.
>>>>>> http://p.sf.net/sfu/oracle-sfdevnl
>>>>>> _______________________________________________
>>>>>> Gramps-devel mailing list
>>>>>> [hidden email]
>>>>>> https://lists.sourceforge.net/lists/listinfo/gramps-devel
>>>> ------------------------------------------------------------------------------
>>>> Protect Your Site and Customers from Malware Attacks
>>>> Learn about various malware tactics and how to avoid them. Understand
>>>> malware threats, the impact they can have on your business, and how you
>>>> can protect your company and customers by using code signing.
>>>> http://p.sf.net/sfu/oracle-sfdevnl
>>>> _______________________________________________
>>>> Gramps-devel mailing list
>>>> [hidden email]
>>>> https://lists.sourceforge.net/lists/listinfo/gramps-devel
>>
>>
>>
>>
>>
>>
>>
>


------------------------------------------------------------------------------
Protect Your Site and Customers from Malware Attacks
Learn about various malware tactics and how to avoid them. Understand
malware threats, the impact they can have on your business, and how you
can protect your company and customers by using code signing.
http://p.sf.net/sfu/oracle-sfdevnl
_______________________________________________
Gramps-devel mailing list
[hidden email]
https://lists.sourceforge.net/lists/listinfo/gramps-devel
12