Towards database synchonizing

classic Classic list List threaded Threaded
12 messages Options
Reply | Threaded
Open this post in threaded view
|

Towards database synchonizing

DS Blank
Devs,

First, http://Gramps-Connect.org is coming along (and kept up-to-date online):

* can browse all data
* can edit all of the core data on all of the main objects
* can delete all of the main objects (currently just deletes, no
warning, no undo)
* can edit main parts of names, surnames
* can run any report, import, and export from the web (no tools though)
* three levels of permissions:
** not logged in: only see non-private, non-living data
** logged in, but not superuser: can see all data, export and run reports
** logged in, superuser: can edit, delete, import data
* can edit notes with markup
* can add children to families, events to people, etc
* can change CSS of site
* can change site name

This is of course still very much alpha, but, I've put by family tree
on line and have started doing simple edits. It tastes like dogfood,
but either I'm getting use to it, or it gets a little better every day
:)

One of the first things that one wants to do is merge the changes made
on-line with a master database. We all have made some initial notes
here:

http://gramps-project.org/wiki/index.php?title=GEPS_009:_Import_Export_Merge
http://www.gramps-project.org/bugs/view.php?id=5853
http://www.gramps-project.org/bugs/view.php?id=2623

Now, I'm seriously thinking about how to do this, perhaps starting
with something simple. I'm thinking that there are three different
analogies:

1) diff and patch: keep track of all edits, deletes, and additions and
create a type of patch file that gets applied to another database.

2) subversion: there is a master database, and all patches are
incorporated there. A special re-sync could get sent out to
checked-out versions.

3) git: all databases are full repositories, and can be forked, merged.

Perhaps starting with #1 is the easiest, and could lead to the others.
But even with that option, there appears to be additional data that
needs to be kept. For example, say I delete an object in a database;
how do I keep track of that, to be able to send it to the other
database?

At a bare minimum, it seems like we need a persistent representation of:

date-time, object-type, handle, change-made, commit-message
2012/6/1 11:00:00, person, 34763478324, deleted, Duplicate person
2012/6/1 12:00:00, person, 23984737847, created, Research on Monday
2012/6/1 12:00:00, source, 38734763786, created, Research on Monday
2012/6/1 12:00:00, citation, 34834767346, created, Research on Monday
2012/6/1 12:01:00, person, 23984737847, edited, Typo on given name

(The commit message is not strictly needed, but I am finding it to be
quite useful.) From this it seems that a patch-like file (written in
xml?) could be made, given a start date-time. Applying the patch may
come into conflicts, but that is a separate issue, I think. We could
also include more information here in the persistent storage (for
example, before and after serializations).

So, if this sounds correct, where/how should the data be stored?
Perhaps just a text file that we can append onto would be safe and
sturdy? Should we reuse the XML representation of the data for the
patch? That sounds best, as we already can read/write those. But a
json file would be easy, too. (Could just use raw Python
serialization, but that could get messy when dealing with database
upgrades).

Comments, ideas welcomed,

-Doug

------------------------------------------------------------------------------
Live Security Virtual Conference
Exclusive live event will cover all the ways today's security and
threat landscape has changed and how IT managers can respond. Discussions
will include endpoint security, mobile security and the latest in malware
threats. http://www.accelacomm.com/jaw/sfrnl04242012/114/50122263/
_______________________________________________
Gramps-devel mailing list
[hidden email]
https://lists.sourceforge.net/lists/listinfo/gramps-devel
Reply | Threaded
Open this post in threaded view
|

Re : Towards database synchonizing

jerome
Doug,

> * can run any report, import, and export from the web (no
> tools though)

I just noted a custom behavior between an empty new SQL base and the example SQL base! It complains about 'iff' argument on some textual reports or /tmp folder for import or export. I thought it was related to ID value (starting person), or something like that but error messages said it was rather a coding issue. So, I was surprising to see that textual reports worked fine by running 'make example'!

> * can change CSS of site

There is still a cosmetic 'hardcoded' style issue with div content block.
By displaying borders for content block, we can easily see the float and position set (gramps-base.html). So, this might help to improve current common style for 'content' into pages, without specific tools or style editors!

> One of the first things that one wants to do is merge the
> changes made on-line with a master database

Yes, this could be an other great feature.
About ways for providing this, I have no additionnal ideas.

You are describing what I would like to see and how this could be handled!
:)


Thank you.
Jérôme

--- En date de : Lun 18.6.12, Doug Blank <[hidden email]> a écrit :

> De: Doug Blank <[hidden email]>
> Objet: [Gramps-devel] Towards database synchonizing
> À: "Gramps Development List" <[hidden email]>
> Date: Lundi 18 juin 2012, 16h26
> Devs,
>
> First, http://Gramps-Connect.org is coming along (and kept
> up-to-date online):
>
> * can browse all data
> * can edit all of the core data on all of the main objects
> * can delete all of the main objects (currently just
> deletes, no
> warning, no undo)
> * can edit main parts of names, surnames
> * can run any report, import, and export from the web (no
> tools though)
> * three levels of permissions:
> ** not logged in: only see non-private, non-living data
> ** logged in, but not superuser: can see all data, export
> and run reports
> ** logged in, superuser: can edit, delete, import data
> * can edit notes with markup
> * can add children to families, events to people, etc
> * can change CSS of site
> * can change site name
>
> This is of course still very much alpha, but, I've put by
> family tree
> on line and have started doing simple edits. It tastes like
> dogfood,
> but either I'm getting use to it, or it gets a little better
> every day
> :)
>
> One of the first things that one wants to do is merge the
> changes made
> on-line with a master database. We all have made some
> initial notes
> here:
>
> http://gramps-project.org/wiki/index.php?title=GEPS_009:_Import_Export_Merge
> http://www.gramps-project.org/bugs/view.php?id=5853
> http://www.gramps-project.org/bugs/view.php?id=2623
>
> Now, I'm seriously thinking about how to do this, perhaps
> starting
> with something simple. I'm thinking that there are three
> different
> analogies:
>
> 1) diff and patch: keep track of all edits, deletes, and
> additions and
> create a type of patch file that gets applied to another
> database.
>
> 2) subversion: there is a master database, and all patches
> are
> incorporated there. A special re-sync could get sent out to
> checked-out versions.
>
> 3) git: all databases are full repositories, and can be
> forked, merged.
>
> Perhaps starting with #1 is the easiest, and could lead to
> the others.
> But even with that option, there appears to be additional
> data that
> needs to be kept. For example, say I delete an object in a
> database;
> how do I keep track of that, to be able to send it to the
> other
> database?
>
> At a bare minimum, it seems like we need a persistent
> representation of:
>
> date-time, object-type, handle, change-made, commit-message
> 2012/6/1 11:00:00, person, 34763478324, deleted, Duplicate
> person
> 2012/6/1 12:00:00, person, 23984737847, created, Research on
> Monday
> 2012/6/1 12:00:00, source, 38734763786, created, Research on
> Monday
> 2012/6/1 12:00:00, citation, 34834767346, created, Research
> on Monday
> 2012/6/1 12:01:00, person, 23984737847, edited, Typo on
> given name
>
> (The commit message is not strictly needed, but I am finding
> it to be
> quite useful.) From this it seems that a patch-like file
> (written in
> xml?) could be made, given a start date-time. Applying the
> patch may
> come into conflicts, but that is a separate issue, I think.
> We could
> also include more information here in the persistent storage
> (for
> example, before and after serializations).
>
> So, if this sounds correct, where/how should the data be
> stored?
> Perhaps just a text file that we can append onto would be
> safe and
> sturdy? Should we reuse the XML representation of the data
> for the
> patch? That sounds best, as we already can read/write those.
> But a
> json file would be easy, too. (Could just use raw Python
> serialization, but that could get messy when dealing with
> database
> upgrades).
>
> Comments, ideas welcomed,
>
> -Doug
>
> ------------------------------------------------------------------------------
> Live Security Virtual Conference
> Exclusive live event will cover all the ways today's
> security and
> threat landscape has changed and how IT managers can
> respond. Discussions
> will include endpoint security, mobile security and the
> latest in malware
> threats. http://www.accelacomm.com/jaw/sfrnl04242012/114/50122263/
> _______________________________________________
> Gramps-devel mailing list
> [hidden email]
> https://lists.sourceforge.net/lists/listinfo/gramps-devel
>

------------------------------------------------------------------------------
Live Security Virtual Conference
Exclusive live event will cover all the ways today's security and
threat landscape has changed and how IT managers can respond. Discussions
will include endpoint security, mobile security and the latest in malware
threats. http://www.accelacomm.com/jaw/sfrnl04242012/114/50122263/
_______________________________________________
Gramps-devel mailing list
[hidden email]
https://lists.sourceforge.net/lists/listinfo/gramps-devel
Reply | Threaded
Open this post in threaded view
|

Re: Re : Towards database synchonizing

DS Blank
On Mon, Jun 18, 2012 at 11:59 AM, jerome <[hidden email]> wrote:
> Doug,
>
>> * can run any report, import, and export from the web (no
>> tools though)
>
> I just noted a custom behavior between an empty new SQL base and the example SQL base! It complains about 'iff' argument on some textual reports or /tmp folder for import or export. I thought it was related to ID value (starting person), or something like that but error messages said it was rather a coding issue. So, I was surprising to see that textual reports worked fine by running 'make example'!
>

I think I stick an "iff" flag on all commands, and some don't like
that, but they just ignore it (yes?). It is just a hack right now.
We'll want to use stored options at some point.

But yes it is quite nice that you can import any file (GEDCOM, XML,
others) over the web, and immediately run any report, which downloads
(by default) as a PDF.

>> * can change CSS of site
>
> There is still a cosmetic 'hardcoded' style issue with div content block.
> By displaying borders for content block, we can easily see the float and position set (gramps-base.html). So, this might help to improve current common style for 'content' into pages, without specific tools or style editors!

(We'll take care of this in the bug tracker... thanks for working on this!)

>> One of the first things that one wants to do is merge the
>> changes made on-line with a master database
>
> Yes, this could be an other great feature.
> About ways for providing this, I have no additionnal ideas.
>
> You are describing what I would like to see and how this could be handled!
> :)

Good... let's figure out how to do that!

-Doug

>
> Thank you.
> Jérôme
>
> --- En date de : Lun 18.6.12, Doug Blank <[hidden email]> a écrit :
>
>> De: Doug Blank <[hidden email]>
>> Objet: [Gramps-devel] Towards database synchonizing
>> À: "Gramps Development List" <[hidden email]>
>> Date: Lundi 18 juin 2012, 16h26
>> Devs,
>>
>> First, http://Gramps-Connect.org is coming along (and kept
>> up-to-date online):
>>
>> * can browse all data
>> * can edit all of the core data on all of the main objects
>> * can delete all of the main objects (currently just
>> deletes, no
>> warning, no undo)
>> * can edit main parts of names, surnames
>> * can run any report, import, and export from the web (no
>> tools though)
>> * three levels of permissions:
>> ** not logged in: only see non-private, non-living data
>> ** logged in, but not superuser: can see all data, export
>> and run reports
>> ** logged in, superuser: can edit, delete, import data
>> * can edit notes with markup
>> * can add children to families, events to people, etc
>> * can change CSS of site
>> * can change site name
>>
>> This is of course still very much alpha, but, I've put by
>> family tree
>> on line and have started doing simple edits. It tastes like
>> dogfood,
>> but either I'm getting use to it, or it gets a little better
>> every day
>> :)
>>
>> One of the first things that one wants to do is merge the
>> changes made
>> on-line with a master database. We all have made some
>> initial notes
>> here:
>>
>> http://gramps-project.org/wiki/index.php?title=GEPS_009:_Import_Export_Merge
>> http://www.gramps-project.org/bugs/view.php?id=5853
>> http://www.gramps-project.org/bugs/view.php?id=2623
>>
>> Now, I'm seriously thinking about how to do this, perhaps
>> starting
>> with something simple. I'm thinking that there are three
>> different
>> analogies:
>>
>> 1) diff and patch: keep track of all edits, deletes, and
>> additions and
>> create a type of patch file that gets applied to another
>> database.
>>
>> 2) subversion: there is a master database, and all patches
>> are
>> incorporated there. A special re-sync could get sent out to
>> checked-out versions.
>>
>> 3) git: all databases are full repositories, and can be
>> forked, merged.
>>
>> Perhaps starting with #1 is the easiest, and could lead to
>> the others.
>> But even with that option, there appears to be additional
>> data that
>> needs to be kept. For example, say I delete an object in a
>> database;
>> how do I keep track of that, to be able to send it to the
>> other
>> database?
>>
>> At a bare minimum, it seems like we need a persistent
>> representation of:
>>
>> date-time, object-type, handle, change-made, commit-message
>> 2012/6/1 11:00:00, person, 34763478324, deleted, Duplicate
>> person
>> 2012/6/1 12:00:00, person, 23984737847, created, Research on
>> Monday
>> 2012/6/1 12:00:00, source, 38734763786, created, Research on
>> Monday
>> 2012/6/1 12:00:00, citation, 34834767346, created, Research
>> on Monday
>> 2012/6/1 12:01:00, person, 23984737847, edited, Typo on
>> given name
>>
>> (The commit message is not strictly needed, but I am finding
>> it to be
>> quite useful.) From this it seems that a patch-like file
>> (written in
>> xml?) could be made, given a start date-time. Applying the
>> patch may
>> come into conflicts, but that is a separate issue, I think.
>> We could
>> also include more information here in the persistent storage
>> (for
>> example, before and after serializations).
>>
>> So, if this sounds correct, where/how should the data be
>> stored?
>> Perhaps just a text file that we can append onto would be
>> safe and
>> sturdy? Should we reuse the XML representation of the data
>> for the
>> patch? That sounds best, as we already can read/write those.
>> But a
>> json file would be easy, too. (Could just use raw Python
>> serialization, but that could get messy when dealing with
>> database
>> upgrades).
>>
>> Comments, ideas welcomed,
>>
>> -Doug
>>
>> ------------------------------------------------------------------------------
>> Live Security Virtual Conference
>> Exclusive live event will cover all the ways today's
>> security and
>> threat landscape has changed and how IT managers can
>> respond. Discussions
>> will include endpoint security, mobile security and the
>> latest in malware
>> threats. http://www.accelacomm.com/jaw/sfrnl04242012/114/50122263/
>> _______________________________________________
>> Gramps-devel mailing list
>> [hidden email]
>> https://lists.sourceforge.net/lists/listinfo/gramps-devel
>>

------------------------------------------------------------------------------
Live Security Virtual Conference
Exclusive live event will cover all the ways today's security and
threat landscape has changed and how IT managers can respond. Discussions
will include endpoint security, mobile security and the latest in malware
threats. http://www.accelacomm.com/jaw/sfrnl04242012/114/50122263/
_______________________________________________
Gramps-devel mailing list
[hidden email]
https://lists.sourceforge.net/lists/listinfo/gramps-devel
Reply | Threaded
Open this post in threaded view
|

Re: Towards database synchonizing

robhealey1
In reply to this post by DS Blank
Greetings:

As complex as this is, I do not have much to say!  The only comment that I have is:

I would like to see the changes, edits, modifications, etc to the database to be in a git repository...

But then we must think of the normal user too, are they going to be able to use our software if we add an increased level of technology to it?

Will they will willing to use it if they do NOT understand about a git repo...

I was also thinking that for a user to make use of gramps-connect in the first place would require a certain level of technology any ways!

Sincerely yours,
Rob G. Healey


On Mon, Jun 18, 2012 at 7:26 AM, Doug Blank <[hidden email]> wrote:
Devs,

First, http://Gramps-Connect.org is coming along (and kept up-to-date online):

* can browse all data
* can edit all of the core data on all of the main objects
* can delete all of the main objects (currently just deletes, no
warning, no undo)
* can edit main parts of names, surnames
* can run any report, import, and export from the web (no tools though)
* three levels of permissions:
** not logged in: only see non-private, non-living data
** logged in, but not superuser: can see all data, export and run reports
** logged in, superuser: can edit, delete, import data
* can edit notes with markup
* can add children to families, events to people, etc
* can change CSS of site
* can change site name

This is of course still very much alpha, but, I've put by family tree
on line and have started doing simple edits. It tastes like dogfood,
but either I'm getting use to it, or it gets a little better every day
:)

One of the first things that one wants to do is merge the changes made
on-line with a master database. We all have made some initial notes
here:

http://gramps-project.org/wiki/index.php?title=GEPS_009:_Import_Export_Merge
http://www.gramps-project.org/bugs/view.php?id=5853
http://www.gramps-project.org/bugs/view.php?id=2623

Now, I'm seriously thinking about how to do this, perhaps starting
with something simple. I'm thinking that there are three different
analogies:

1) diff and patch: keep track of all edits, deletes, and additions and
create a type of patch file that gets applied to another database.

2) subversion: there is a master database, and all patches are
incorporated there. A special re-sync could get sent out to
checked-out versions.

3) git: all databases are full repositories, and can be forked, merged.

Perhaps starting with #1 is the easiest, and could lead to the others.
But even with that option, there appears to be additional data that
needs to be kept. For example, say I delete an object in a database;
how do I keep track of that, to be able to send it to the other
database?

At a bare minimum, it seems like we need a persistent representation of:

date-time, object-type, handle, change-made, commit-message
2012/6/1 11:00:00, person, 34763478324, deleted, Duplicate person
2012/6/1 12:00:00, person, 23984737847, created, Research on Monday
2012/6/1 12:00:00, source, 38734763786, created, Research on Monday
2012/6/1 12:00:00, citation, 34834767346, created, Research on Monday
2012/6/1 12:01:00, person, 23984737847, edited, Typo on given name

(The commit message is not strictly needed, but I am finding it to be
quite useful.) From this it seems that a patch-like file (written in
xml?) could be made, given a start date-time. Applying the patch may
come into conflicts, but that is a separate issue, I think. We could
also include more information here in the persistent storage (for
example, before and after serializations).

So, if this sounds correct, where/how should the data be stored?
Perhaps just a text file that we can append onto would be safe and
sturdy? Should we reuse the XML representation of the data for the
patch? That sounds best, as we already can read/write those. But a
json file would be easy, too. (Could just use raw Python
serialization, but that could get messy when dealing with database
upgrades).

Comments, ideas welcomed,

-Doug

------------------------------------------------------------------------------
Live Security Virtual Conference
Exclusive live event will cover all the ways today's security and
threat landscape has changed and how IT managers can respond. Discussions
will include endpoint security, mobile security and the latest in malware
threats. http://www.accelacomm.com/jaw/sfrnl04242012/114/50122263/
_______________________________________________
Gramps-devel mailing list
[hidden email]
https://lists.sourceforge.net/lists/listinfo/gramps-devel



--
Sincerely yours,
Rob G. Healey



------------------------------------------------------------------------------
Live Security Virtual Conference
Exclusive live event will cover all the ways today's security and
threat landscape has changed and how IT managers can respond. Discussions
will include endpoint security, mobile security and the latest in malware
threats. http://www.accelacomm.com/jaw/sfrnl04242012/114/50122263/
_______________________________________________
Gramps-devel mailing list
[hidden email]
https://lists.sourceforge.net/lists/listinfo/gramps-devel
Reply | Threaded
Open this post in threaded view
|

Re: Towards database synchonizing

DS Blank
On Mon, Jun 18, 2012 at 11:27 PM, Rob Healey <[hidden email]> wrote:

> Greetings:
>
> As complex as this is, I do not have much to say!  The only comment that I
> have is:
>
> I would like to see the changes, edits, modifications, etc to the database
> to be in a git repository...
>
> But then we must think of the normal user too, are they going to be able to
> use our software if we add an increased level of technology to it?
>
> Will they will willing to use it if they do NOT understand about a git
> repo...
>
> I was also thinking that for a user to make use of gramps-connect in the
> first place would require a certain level of technology any ways!

Rob,

I don't think we want to use git... that was just an analogy for
having the same kinds of functionality. For example, any git
repository, fork or original, contains the full history.

One can use git now, if you wanted to check your XML file in. But that
doesn't help at the object level. For example, if you changed a date
on an Event and you used git (or some other text-based revision
system) you'd have to export the full XML, make the commit which would
compute the diff. In the scenario I am proposing, a Gramps-patch would
know which object was changed, and could just update that single
object.

I wouldn't want to use a system that is too complex. I imagine that
there will only be a couple of options: "Update from master" (aka, svn
update), and "Commit Changes" (aka, svn commit).

-Doug

> Sincerely yours,
> Rob G. Healey
>
>
> On Mon, Jun 18, 2012 at 7:26 AM, Doug Blank <[hidden email]> wrote:
>>
>> Devs,
>>
>> First, http://Gramps-Connect.org is coming along (and kept up-to-date
>> online):
>>
>> * can browse all data
>> * can edit all of the core data on all of the main objects
>> * can delete all of the main objects (currently just deletes, no
>> warning, no undo)
>> * can edit main parts of names, surnames
>> * can run any report, import, and export from the web (no tools though)
>> * three levels of permissions:
>> ** not logged in: only see non-private, non-living data
>> ** logged in, but not superuser: can see all data, export and run reports
>> ** logged in, superuser: can edit, delete, import data
>> * can edit notes with markup
>> * can add children to families, events to people, etc
>> * can change CSS of site
>> * can change site name
>>
>> This is of course still very much alpha, but, I've put by family tree
>> on line and have started doing simple edits. It tastes like dogfood,
>> but either I'm getting use to it, or it gets a little better every day
>> :)
>>
>> One of the first things that one wants to do is merge the changes made
>> on-line with a master database. We all have made some initial notes
>> here:
>>
>>
>> http://gramps-project.org/wiki/index.php?title=GEPS_009:_Import_Export_Merge
>> http://www.gramps-project.org/bugs/view.php?id=5853
>> http://www.gramps-project.org/bugs/view.php?id=2623
>>
>> Now, I'm seriously thinking about how to do this, perhaps starting
>> with something simple. I'm thinking that there are three different
>> analogies:
>>
>> 1) diff and patch: keep track of all edits, deletes, and additions and
>> create a type of patch file that gets applied to another database.
>>
>> 2) subversion: there is a master database, and all patches are
>> incorporated there. A special re-sync could get sent out to
>> checked-out versions.
>>
>> 3) git: all databases are full repositories, and can be forked, merged.
>>
>> Perhaps starting with #1 is the easiest, and could lead to the others.
>> But even with that option, there appears to be additional data that
>> needs to be kept. For example, say I delete an object in a database;
>> how do I keep track of that, to be able to send it to the other
>> database?
>>
>> At a bare minimum, it seems like we need a persistent representation of:
>>
>> date-time, object-type, handle, change-made, commit-message
>> 2012/6/1 11:00:00, person, 34763478324, deleted, Duplicate person
>> 2012/6/1 12:00:00, person, 23984737847, created, Research on Monday
>> 2012/6/1 12:00:00, source, 38734763786, created, Research on Monday
>> 2012/6/1 12:00:00, citation, 34834767346, created, Research on Monday
>> 2012/6/1 12:01:00, person, 23984737847, edited, Typo on given name
>>
>> (The commit message is not strictly needed, but I am finding it to be
>> quite useful.) From this it seems that a patch-like file (written in
>> xml?) could be made, given a start date-time. Applying the patch may
>> come into conflicts, but that is a separate issue, I think. We could
>> also include more information here in the persistent storage (for
>> example, before and after serializations).
>>
>> So, if this sounds correct, where/how should the data be stored?
>> Perhaps just a text file that we can append onto would be safe and
>> sturdy? Should we reuse the XML representation of the data for the
>> patch? That sounds best, as we already can read/write those. But a
>> json file would be easy, too. (Could just use raw Python
>> serialization, but that could get messy when dealing with database
>> upgrades).
>>
>> Comments, ideas welcomed,
>>
>> -Doug
>>
>>
>> ------------------------------------------------------------------------------
>> Live Security Virtual Conference
>> Exclusive live event will cover all the ways today's security and
>> threat landscape has changed and how IT managers can respond. Discussions
>> will include endpoint security, mobile security and the latest in malware
>> threats. http://www.accelacomm.com/jaw/sfrnl04242012/114/50122263/
>> _______________________________________________
>> Gramps-devel mailing list
>> [hidden email]
>> https://lists.sourceforge.net/lists/listinfo/gramps-devel
>
>
>
>
> --
> Sincerely yours,
> Rob G. Healey
>
>

------------------------------------------------------------------------------
Live Security Virtual Conference
Exclusive live event will cover all the ways today's security and
threat landscape has changed and how IT managers can respond. Discussions
will include endpoint security, mobile security and the latest in malware
threats. http://www.accelacomm.com/jaw/sfrnl04242012/114/50122263/
_______________________________________________
Gramps-devel mailing list
[hidden email]
https://lists.sourceforge.net/lists/listinfo/gramps-devel
Reply | Threaded
Open this post in threaded view
|

Re: Towards database synchonizing

Benny Malengier
Deleted objects could be just a table with
handle, object_type, date_delete.

Sync would be different from import. With sync, I understand that we can assume database handle values to be identical. This simplifies from merge where we don't have equal handles!

Sync has a target database that needs updating, and an origin database from which we update.
There could be a date: sync changes since _date_

All objects with a change date after _date_ have been updated (or created), deleted objects table has the objects which are deleted.

Sync is then :
1. backup revision of the family tree before sync
2. new objects => mark for insert
3. deleted objects, no change locally after delete date => mark for deletion
4. deleted objects, change locally => mark for user confirm for deletion
5. updated objects => do a diff on differences, mark origin values as new data
6. give overview to user on what will happen, ask for confirmation
7. do the sync
8. Tell user to do the inverse sync on the origin database so as to sync your local changes.

If we don't have a sync date, we need to everytime diff all objects.
Note that for the certificate manager we would also need a diff structure on objects, see that mail discussion.
Note that you might want to sync 4 different databases (two local computers, one cloud, one mobile), so _date_ above is not unique but is just 'user input' to make sync go faster (no compare of all objects).

Benny

2012/6/19 Doug Blank <[hidden email]>
On Mon, Jun 18, 2012 at 11:27 PM, Rob Healey <[hidden email]> wrote:
> Greetings:
>
> As complex as this is, I do not have much to say!  The only comment that I
> have is:
>
> I would like to see the changes, edits, modifications, etc to the database
> to be in a git repository...
>
> But then we must think of the normal user too, are they going to be able to
> use our software if we add an increased level of technology to it?
>
> Will they will willing to use it if they do NOT understand about a git
> repo...
>
> I was also thinking that for a user to make use of gramps-connect in the
> first place would require a certain level of technology any ways!

Rob,

I don't think we want to use git... that was just an analogy for
having the same kinds of functionality. For example, any git
repository, fork or original, contains the full history.

One can use git now, if you wanted to check your XML file in. But that
doesn't help at the object level. For example, if you changed a date
on an Event and you used git (or some other text-based revision
system) you'd have to export the full XML, make the commit which would
compute the diff. In the scenario I am proposing, a Gramps-patch would
know which object was changed, and could just update that single
object.

I wouldn't want to use a system that is too complex. I imagine that
there will only be a couple of options: "Update from master" (aka, svn
update), and "Commit Changes" (aka, svn commit).

-Doug

> Sincerely yours,
> Rob G. Healey
>
>
> On Mon, Jun 18, 2012 at 7:26 AM, Doug Blank <[hidden email]> wrote:
>>
>> Devs,
>>
>> First, http://Gramps-Connect.org is coming along (and kept up-to-date
>> online):
>>
>> * can browse all data
>> * can edit all of the core data on all of the main objects
>> * can delete all of the main objects (currently just deletes, no
>> warning, no undo)
>> * can edit main parts of names, surnames
>> * can run any report, import, and export from the web (no tools though)
>> * three levels of permissions:
>> ** not logged in: only see non-private, non-living data
>> ** logged in, but not superuser: can see all data, export and run reports
>> ** logged in, superuser: can edit, delete, import data
>> * can edit notes with markup
>> * can add children to families, events to people, etc
>> * can change CSS of site
>> * can change site name
>>
>> This is of course still very much alpha, but, I've put by family tree
>> on line and have started doing simple edits. It tastes like dogfood,
>> but either I'm getting use to it, or it gets a little better every day
>> :)
>>
>> One of the first things that one wants to do is merge the changes made
>> on-line with a master database. We all have made some initial notes
>> here:
>>
>>
>> http://gramps-project.org/wiki/index.php?title=GEPS_009:_Import_Export_Merge
>> http://www.gramps-project.org/bugs/view.php?id=5853
>> http://www.gramps-project.org/bugs/view.php?id=2623
>>
>> Now, I'm seriously thinking about how to do this, perhaps starting
>> with something simple. I'm thinking that there are three different
>> analogies:
>>
>> 1) diff and patch: keep track of all edits, deletes, and additions and
>> create a type of patch file that gets applied to another database.
>>
>> 2) subversion: there is a master database, and all patches are
>> incorporated there. A special re-sync could get sent out to
>> checked-out versions.
>>
>> 3) git: all databases are full repositories, and can be forked, merged.
>>
>> Perhaps starting with #1 is the easiest, and could lead to the others.
>> But even with that option, there appears to be additional data that
>> needs to be kept. For example, say I delete an object in a database;
>> how do I keep track of that, to be able to send it to the other
>> database?
>>
>> At a bare minimum, it seems like we need a persistent representation of:
>>
>> date-time, object-type, handle, change-made, commit-message
>> 2012/6/1 11:00:00, person, 34763478324, deleted, Duplicate person
>> 2012/6/1 12:00:00, person, 23984737847, created, Research on Monday
>> 2012/6/1 12:00:00, source, 38734763786, created, Research on Monday
>> 2012/6/1 12:00:00, citation, 34834767346, created, Research on Monday
>> 2012/6/1 12:01:00, person, 23984737847, edited, Typo on given name
>>
>> (The commit message is not strictly needed, but I am finding it to be
>> quite useful.) From this it seems that a patch-like file (written in
>> xml?) could be made, given a start date-time. Applying the patch may
>> come into conflicts, but that is a separate issue, I think. We could
>> also include more information here in the persistent storage (for
>> example, before and after serializations).
>>
>> So, if this sounds correct, where/how should the data be stored?
>> Perhaps just a text file that we can append onto would be safe and
>> sturdy? Should we reuse the XML representation of the data for the
>> patch? That sounds best, as we already can read/write those. But a
>> json file would be easy, too. (Could just use raw Python
>> serialization, but that could get messy when dealing with database
>> upgrades).
>>
>> Comments, ideas welcomed,
>>
>> -Doug
>>
>>
>> ------------------------------------------------------------------------------
>> Live Security Virtual Conference
>> Exclusive live event will cover all the ways today's security and
>> threat landscape has changed and how IT managers can respond. Discussions
>> will include endpoint security, mobile security and the latest in malware
>> threats. http://www.accelacomm.com/jaw/sfrnl04242012/114/50122263/
>> _______________________________________________
>> Gramps-devel mailing list
>> [hidden email]
>> https://lists.sourceforge.net/lists/listinfo/gramps-devel
>
>
>
>
> --
> Sincerely yours,
> Rob G. Healey
>
>

------------------------------------------------------------------------------
Live Security Virtual Conference
Exclusive live event will cover all the ways today's security and
threat landscape has changed and how IT managers can respond. Discussions
will include endpoint security, mobile security and the latest in malware
threats. http://www.accelacomm.com/jaw/sfrnl04242012/114/50122263/
_______________________________________________
Gramps-devel mailing list
[hidden email]
https://lists.sourceforge.net/lists/listinfo/gramps-devel


------------------------------------------------------------------------------
Live Security Virtual Conference
Exclusive live event will cover all the ways today's security and
threat landscape has changed and how IT managers can respond. Discussions
will include endpoint security, mobile security and the latest in malware
threats. http://www.accelacomm.com/jaw/sfrnl04242012/114/50122263/
_______________________________________________
Gramps-devel mailing list
[hidden email]
https://lists.sourceforge.net/lists/listinfo/gramps-devel
Reply | Threaded
Open this post in threaded view
|

Re: Towards database synchonizing

jerome
In reply to this post by DS Blank
Note, it seems that there is already available dictionnaries from
'plugins/lib/libmixin.py' for matching some handle on import.


--- En date de : Mar 19.6.12, Benny Malengier <[hidden email]> a écrit :

De: Benny Malengier <[hidden email]>
Objet: Re: [Gramps-devel] Towards database synchonizing
À: "Doug Blank" <[hidden email]>
Cc: "Gramps Development List" <[hidden email]>
Date: Mardi 19 juin 2012, 9h52

Deleted objects could be just a table with
handle, object_type, date_delete.

Sync would be different from import. With sync, I understand that we can assume database handle values to be identical. This simplifies from merge where we don't have equal handles!


Sync has a target database that needs updating, and an origin database from which we update.
There could be a date: sync changes since _date_

All objects with a change date after _date_ have been updated (or created), deleted objects table has the objects which are deleted.


Sync is then :
1. backup revision of the family tree before sync
2. new objects => mark for insert
3. deleted objects, no change locally after delete date => mark for deletion
4. deleted objects, change locally => mark for user confirm for deletion

5. updated objects => do a diff on differences, mark origin values as new data
6. give overview to user on what will happen, ask for confirmation
7. do the sync
8. Tell user to do the inverse sync on the origin database so as to sync your local changes.


If we don't have a sync date, we need to everytime diff all objects.
Note that for the certificate manager we would also need a diff structure on objects, see that mail discussion.
Note that you might want to sync 4 different databases (two local computers, one cloud, one mobile), so _date_ above is not unique but is just 'user input' to make sync go faster (no compare of all objects).


Benny

2012/6/19 Doug Blank <[hidden email]>

On Mon, Jun 18, 2012 at 11:27 PM, Rob Healey <[hidden email]> wrote:

> Greetings:

>

> As complex as this is, I do not have much to say!  The only comment that I

> have is:

>

> I would like to see the changes, edits, modifications, etc to the database

> to be in a git repository...

>

> But then we must think of the normal user too, are they going to be able to

> use our software if we add an increased level of technology to it?

>

> Will they will willing to use it if they do NOT understand about a git

> repo...

>

> I was also thinking that for a user to make use of gramps-connect in the

> first place would require a certain level of technology any ways!



Rob,



I don't think we want to use git... that was just an analogy for

having the same kinds of functionality. For example, any git

repository, fork or original, contains the full history.



One can use git now, if you wanted to check your XML file in. But that

doesn't help at the object level. For example, if you changed a date

on an Event and you used git (or some other text-based revision

system) you'd have to export the full XML, make the commit which would

compute the diff. In the scenario I am proposing, a Gramps-patch would

know which object was changed, and could just update that single

object.



I wouldn't want to use a system that is too complex. I imagine that

there will only be a couple of options: "Update from master" (aka, svn

update), and "Commit Changes" (aka, svn commit).



-Doug



> Sincerely yours,

> Rob G. Healey

>

>

> On Mon, Jun 18, 2012 at 7:26 AM, Doug Blank <[hidden email]> wrote:

>>

>> Devs,

>>

>> First, http://Gramps-Connect.org is coming along (and kept up-to-date

>> online):

>>

>> * can browse all data

>> * can edit all of the core data on all of the main objects

>> * can delete all of the main objects (currently just deletes, no

>> warning, no undo)

>> * can edit main parts of names, surnames

>> * can run any report, import, and export from the web (no tools though)

>> * three levels of permissions:

>> ** not logged in: only see non-private, non-living data

>> ** logged in, but not superuser: can see all data, export and run reports

>> ** logged in, superuser: can edit, delete, import data

>> * can edit notes with markup

>> * can add children to families, events to people, etc

>> * can change CSS of site

>> * can change site name

>>

>> This is of course still very much alpha, but, I've put by family tree

>> on line and have started doing simple edits. It tastes like dogfood,

>> but either I'm getting use to it, or it gets a little better every day

>> :)

>>

>> One of the first things that one wants to do is merge the changes made

>> on-line with a master database. We all have made some initial notes

>> here:

>>

>>

>> http://gramps-project.org/wiki/index.php?title=GEPS_009:_Import_Export_Merge

>> http://www.gramps-project.org/bugs/view.php?id=5853

>> http://www.gramps-project.org/bugs/view.php?id=2623

>>

>> Now, I'm seriously thinking about how to do this, perhaps starting

>> with something simple. I'm thinking that there are three different

>> analogies:

>>

>> 1) diff and patch: keep track of all edits, deletes, and additions and

>> create a type of patch file that gets applied to another database.

>>

>> 2) subversion: there is a master database, and all patches are

>> incorporated there. A special re-sync could get sent out to

>> checked-out versions.

>>

>> 3) git: all databases are full repositories, and can be forked, merged.

>>

>> Perhaps starting with #1 is the easiest, and could lead to the others.

>> But even with that option, there appears to be additional data that

>> needs to be kept. For example, say I delete an object in a database;

>> how do I keep track of that, to be able to send it to the other

>> database?

>>

>> At a bare minimum, it seems like we need a persistent representation of:

>>

>> date-time, object-type, handle, change-made, commit-message

>> 2012/6/1 11:00:00, person, 34763478324, deleted, Duplicate person

>> 2012/6/1 12:00:00, person, 23984737847, created, Research on Monday

>> 2012/6/1 12:00:00, source, 38734763786, created, Research on Monday

>> 2012/6/1 12:00:00, citation, 34834767346, created, Research on Monday

>> 2012/6/1 12:01:00, person, 23984737847, edited, Typo on given name

>>

>> (The commit message is not strictly needed, but I am finding it to be

>> quite useful.) From this it seems that a patch-like file (written in

>> xml?) could be made, given a start date-time. Applying the patch may

>> come into conflicts, but that is a separate issue, I think. We could

>> also include more information here in the persistent storage (for

>> example, before and after serializations).

>>

>> So, if this sounds correct, where/how should the data be stored?

>> Perhaps just a text file that we can append onto would be safe and

>> sturdy? Should we reuse the XML representation of the data for the

>> patch? That sounds best, as we already can read/write those. But a

>> json file would be easy, too. (Could just use raw Python

>> serialization, but that could get messy when dealing with database

>> upgrades).

>>

>> Comments, ideas welcomed,

>>

>> -Doug

>>

>>

>> ------------------------------------------------------------------------------

>> Live Security Virtual Conference

>> Exclusive live event will cover all the ways today's security and

>> threat landscape has changed and how IT managers can respond. Discussions

>> will include endpoint security, mobile security and the latest in malware

>> threats. http://www.accelacomm.com/jaw/sfrnl04242012/114/50122263/

>> _______________________________________________

>> Gramps-devel mailing list

>> [hidden email]

>> https://lists.sourceforge.net/lists/listinfo/gramps-devel

>

>

>

>

> --

> Sincerely yours,

> Rob G. Healey

>

>



------------------------------------------------------------------------------

Live Security Virtual Conference

Exclusive live event will cover all the ways today's security and

threat landscape has changed and how IT managers can respond. Discussions

will include endpoint security, mobile security and the latest in malware

threats. http://www.accelacomm.com/jaw/sfrnl04242012/114/50122263/

_______________________________________________

Gramps-devel mailing list

[hidden email]

https://lists.sourceforge.net/lists/listinfo/gramps-devel




-----La pièce jointe associée suit-----

------------------------------------------------------------------------------
Live Security Virtual Conference
Exclusive live event will cover all the ways today's security and
threat landscape has changed and how IT managers can respond. Discussions
will include endpoint security, mobile security and the latest in malware
threats. http://www.accelacomm.com/jaw/sfrnl04242012/114/50122263/
-----La pièce jointe associée suit-----

_______________________________________________
Gramps-devel mailing list
[hidden email]
https://lists.sourceforge.net/lists/listinfo/gramps-devel


------------------------------------------------------------------------------
Live Security Virtual Conference
Exclusive live event will cover all the ways today's security and
threat landscape has changed and how IT managers can respond. Discussions
will include endpoint security, mobile security and the latest in malware
threats. http://www.accelacomm.com/jaw/sfrnl04242012/114/50122263/
_______________________________________________
Gramps-devel mailing list
[hidden email]
https://lists.sourceforge.net/lists/listinfo/gramps-devel
Reply | Threaded
Open this post in threaded view
|

Re: Towards database synchonizing

DS Blank
In reply to this post by Benny Malengier
On Tue, Jun 19, 2012 at 3:52 AM, Benny Malengier
<[hidden email]> wrote:
> Deleted objects could be just a table with
> handle, object_type, date_delete.

Ok, I think that you are correct that the only persistent additional
data required is this list of deleted objects. Let's call this a Log
file. This Log is import to carry out the syncing. I guess it should
be exported with the Gramps XML as well.

> Sync would be different from import. With sync, I understand that we can
> assume database handle values to be identical. This simplifies from merge
> where we don't have equal handles!
>
> Sync has a target database that needs updating, and an origin database from
> which we update.
> There could be a date: sync changes since _date_
>
> All objects with a change date after _date_ have been updated (or created),
> deleted objects table has the objects which are deleted.
>
> Sync is then :
> 1. backup revision of the family tree before sync
> 2. new objects => mark for insert
> 3. deleted objects, no change locally after delete date => mark for deletion
> 4. deleted objects, change locally => mark for user confirm for deletion
> 5. updated objects => do a diff on differences, mark origin values as new
> data
> 6. give overview to user on what will happen, ask for confirmation
> 7. do the sync
> 8. Tell user to do the inverse sync on the origin database so as to sync
> your local changes.
>
> If we don't have a sync date, we need to everytime diff all objects.
> Note that for the certificate manager we would also need a diff structure on
> objects, see that mail discussion.
> Note that you might want to sync 4 different databases (two local computers,
> one cloud, one mobile), so _date_ above is not unique but is just 'user
> input' to make sync go faster (no compare of all objects).

I think that this is all correct. However, I think I would like one
additional related feature: I want to be able to look over either the
proposed changes in detail or the changes once they have been made.

I imagine something like the svn revision comparisons. I want to see
the commit message, and I want to see a diff between what it was and
what it became. Like:

2012/6/2 dsblank changed Person I0045:
  Reason: Was a typo based on S0045
  Before:
     Given name: Tomas
  After:
     Given name: Thomas
2012/6/3 shelper deleted Event E0005:
  Reason: Duplicated with Event E0006

I think that this functionality is more important when collaborating,
because one has no idea why these changes are being made. Imagine
doing collaborative code development without such tools. But even when
working alone, such repository tools are necessary. Examples:

1) You start out making changes based on a new source. Corrected
dates, new child, etc. Only to find out that this is the wrong family.
You want to revert the changes you made last Tuesday between 11am and
11:30am.

2) A specific user changed a person's birthdate, and you don't feel
confident about it, so you want to ask about it. If you didn't have
the diff like above, you would not be aware of the specific change.
(We could generate a diff when we decide to sync or not, but then it
is gone).

3) A database gets corrupted. You can restore from back up, and apply
all of the changes using the Log file since then to get back to where
you were exactly.

Even if we do add a more sophisticated Log file, I can see that some
people would not want to use it, so it could be optional, and Gramps
would work just as it is. It could take up some space, both on disk
and in the XML.

-Doug

> Benny
>
>
> 2012/6/19 Doug Blank <[hidden email]>
>>
>> On Mon, Jun 18, 2012 at 11:27 PM, Rob Healey <[hidden email]> wrote:
>> > Greetings:
>> >
>> > As complex as this is, I do not have much to say!  The only comment that
>> > I
>> > have is:
>> >
>> > I would like to see the changes, edits, modifications, etc to the
>> > database
>> > to be in a git repository...
>> >
>> > But then we must think of the normal user too, are they going to be able
>> > to
>> > use our software if we add an increased level of technology to it?
>> >
>> > Will they will willing to use it if they do NOT understand about a git
>> > repo...
>> >
>> > I was also thinking that for a user to make use of gramps-connect in the
>> > first place would require a certain level of technology any ways!
>>
>> Rob,
>>
>> I don't think we want to use git... that was just an analogy for
>> having the same kinds of functionality. For example, any git
>> repository, fork or original, contains the full history.
>>
>> One can use git now, if you wanted to check your XML file in. But that
>> doesn't help at the object level. For example, if you changed a date
>> on an Event and you used git (or some other text-based revision
>> system) you'd have to export the full XML, make the commit which would
>> compute the diff. In the scenario I am proposing, a Gramps-patch would
>> know which object was changed, and could just update that single
>> object.
>>
>> I wouldn't want to use a system that is too complex. I imagine that
>> there will only be a couple of options: "Update from master" (aka, svn
>> update), and "Commit Changes" (aka, svn commit).
>>
>> -Doug
>>
>> > Sincerely yours,
>> > Rob G. Healey
>> >
>> >
>> > On Mon, Jun 18, 2012 at 7:26 AM, Doug Blank <[hidden email]>
>> > wrote:
>> >>
>> >> Devs,
>> >>
>> >> First, http://Gramps-Connect.org is coming along (and kept up-to-date
>> >> online):
>> >>
>> >> * can browse all data
>> >> * can edit all of the core data on all of the main objects
>> >> * can delete all of the main objects (currently just deletes, no
>> >> warning, no undo)
>> >> * can edit main parts of names, surnames
>> >> * can run any report, import, and export from the web (no tools though)
>> >> * three levels of permissions:
>> >> ** not logged in: only see non-private, non-living data
>> >> ** logged in, but not superuser: can see all data, export and run
>> >> reports
>> >> ** logged in, superuser: can edit, delete, import data
>> >> * can edit notes with markup
>> >> * can add children to families, events to people, etc
>> >> * can change CSS of site
>> >> * can change site name
>> >>
>> >> This is of course still very much alpha, but, I've put by family tree
>> >> on line and have started doing simple edits. It tastes like dogfood,
>> >> but either I'm getting use to it, or it gets a little better every day
>> >> :)
>> >>
>> >> One of the first things that one wants to do is merge the changes made
>> >> on-line with a master database. We all have made some initial notes
>> >> here:
>> >>
>> >>
>> >>
>> >> http://gramps-project.org/wiki/index.php?title=GEPS_009:_Import_Export_Merge
>> >> http://www.gramps-project.org/bugs/view.php?id=5853
>> >> http://www.gramps-project.org/bugs/view.php?id=2623
>> >>
>> >> Now, I'm seriously thinking about how to do this, perhaps starting
>> >> with something simple. I'm thinking that there are three different
>> >> analogies:
>> >>
>> >> 1) diff and patch: keep track of all edits, deletes, and additions and
>> >> create a type of patch file that gets applied to another database.
>> >>
>> >> 2) subversion: there is a master database, and all patches are
>> >> incorporated there. A special re-sync could get sent out to
>> >> checked-out versions.
>> >>
>> >> 3) git: all databases are full repositories, and can be forked, merged.
>> >>
>> >> Perhaps starting with #1 is the easiest, and could lead to the others.
>> >> But even with that option, there appears to be additional data that
>> >> needs to be kept. For example, say I delete an object in a database;
>> >> how do I keep track of that, to be able to send it to the other
>> >> database?
>> >>
>> >> At a bare minimum, it seems like we need a persistent representation
>> >> of:
>> >>
>> >> date-time, object-type, handle, change-made, commit-message
>> >> 2012/6/1 11:00:00, person, 34763478324, deleted, Duplicate person
>> >> 2012/6/1 12:00:00, person, 23984737847, created, Research on Monday
>> >> 2012/6/1 12:00:00, source, 38734763786, created, Research on Monday
>> >> 2012/6/1 12:00:00, citation, 34834767346, created, Research on Monday
>> >> 2012/6/1 12:01:00, person, 23984737847, edited, Typo on given name
>> >>
>> >> (The commit message is not strictly needed, but I am finding it to be
>> >> quite useful.) From this it seems that a patch-like file (written in
>> >> xml?) could be made, given a start date-time. Applying the patch may
>> >> come into conflicts, but that is a separate issue, I think. We could
>> >> also include more information here in the persistent storage (for
>> >> example, before and after serializations).
>> >>
>> >> So, if this sounds correct, where/how should the data be stored?
>> >> Perhaps just a text file that we can append onto would be safe and
>> >> sturdy? Should we reuse the XML representation of the data for the
>> >> patch? That sounds best, as we already can read/write those. But a
>> >> json file would be easy, too. (Could just use raw Python
>> >> serialization, but that could get messy when dealing with database
>> >> upgrades).
>> >>
>> >> Comments, ideas welcomed,
>> >>
>> >> -Doug
>> >>
>> >>
>> >>
>> >> ------------------------------------------------------------------------------
>> >> Live Security Virtual Conference
>> >> Exclusive live event will cover all the ways today's security and
>> >> threat landscape has changed and how IT managers can respond.
>> >> Discussions
>> >> will include endpoint security, mobile security and the latest in
>> >> malware
>> >> threats. http://www.accelacomm.com/jaw/sfrnl04242012/114/50122263/
>> >> _______________________________________________
>> >> Gramps-devel mailing list
>> >> [hidden email]
>> >> https://lists.sourceforge.net/lists/listinfo/gramps-devel
>> >
>> >
>> >
>> >
>> > --
>> > Sincerely yours,
>> > Rob G. Healey
>> >
>> >
>>
>>
>> ------------------------------------------------------------------------------
>> Live Security Virtual Conference
>> Exclusive live event will cover all the ways today's security and
>> threat landscape has changed and how IT managers can respond. Discussions
>> will include endpoint security, mobile security and the latest in malware
>> threats. http://www.accelacomm.com/jaw/sfrnl04242012/114/50122263/
>> _______________________________________________
>> Gramps-devel mailing list
>> [hidden email]
>> https://lists.sourceforge.net/lists/listinfo/gramps-devel
>
>

------------------------------------------------------------------------------
Live Security Virtual Conference
Exclusive live event will cover all the ways today's security and
threat landscape has changed and how IT managers can respond. Discussions
will include endpoint security, mobile security and the latest in malware
threats. http://www.accelacomm.com/jaw/sfrnl04242012/114/50122263/
_______________________________________________
Gramps-devel mailing list
[hidden email]
https://lists.sourceforge.net/lists/listinfo/gramps-devel
Reply | Threaded
Open this post in threaded view
|

Re: Towards database synchonizing

Nick Hall-6
Doug,

I don't have much to add, but this will be very useful functionality.

Your subversion analogy is good.  To sync two databases you need to find
the "diff" between them.  The apply the "patch" after some user input to
resolve "conflicts".

Technically we could open two databases at the same time.  Perhaps we
could then add some "diff" functionality into gen.lib?

Suppose I have a database that I export for someone else to work on.  
They make some changes and send me back a modified xml.  What I want to
do is see the changes that they have made.

I would like to open my original database, and then import their xml
into another database (both open at the same time).  Then it would be
nice to see their changes in a "graphical diff" view.  Finally I could
decide how to sync the changes.  Could we use a Dictionary Database for
the imported xml?

In this case we could use Gramps handles to match objects, but would it
be useful to add a UUID at this point?

I can also see why a log file was suggested to keep track of changes
rather than doing a "diff".  At the moment the Gramps log files don't
store detailed enough information.  This is certainly worth
consideration though.

Regards,

Nick.


On 19/06/12 13:44, Doug Blank wrote:

> On Tue, Jun 19, 2012 at 3:52 AM, Benny Malengier
> <[hidden email]>  wrote:
>> Deleted objects could be just a table with
>> handle, object_type, date_delete.
> Ok, I think that you are correct that the only persistent additional
> data required is this list of deleted objects. Let's call this a Log
> file. This Log is import to carry out the syncing. I guess it should
> be exported with the Gramps XML as well.
>
>> Sync would be different from import. With sync, I understand that we can
>> assume database handle values to be identical. This simplifies from merge
>> where we don't have equal handles!
>>
>> Sync has a target database that needs updating, and an origin database from
>> which we update.
>> There could be a date: sync changes since _date_
>>
>> All objects with a change date after _date_ have been updated (or created),
>> deleted objects table has the objects which are deleted.
>>
>> Sync is then :
>> 1. backup revision of the family tree before sync
>> 2. new objects =>  mark for insert
>> 3. deleted objects, no change locally after delete date =>  mark for deletion
>> 4. deleted objects, change locally =>  mark for user confirm for deletion
>> 5. updated objects =>  do a diff on differences, mark origin values as new
>> data
>> 6. give overview to user on what will happen, ask for confirmation
>> 7. do the sync
>> 8. Tell user to do the inverse sync on the origin database so as to sync
>> your local changes.
>>
>> If we don't have a sync date, we need to everytime diff all objects.
>> Note that for the certificate manager we would also need a diff structure on
>> objects, see that mail discussion.
>> Note that you might want to sync 4 different databases (two local computers,
>> one cloud, one mobile), so _date_ above is not unique but is just 'user
>> input' to make sync go faster (no compare of all objects).
> I think that this is all correct. However, I think I would like one
> additional related feature: I want to be able to look over either the
> proposed changes in detail or the changes once they have been made.
>
> I imagine something like the svn revision comparisons. I want to see
> the commit message, and I want to see a diff between what it was and
> what it became. Like:
>
> 2012/6/2 dsblank changed Person I0045:
>    Reason: Was a typo based on S0045
>    Before:
>       Given name: Tomas
>    After:
>       Given name: Thomas
> 2012/6/3 shelper deleted Event E0005:
>    Reason: Duplicated with Event E0006
>
> I think that this functionality is more important when collaborating,
> because one has no idea why these changes are being made. Imagine
> doing collaborative code development without such tools. But even when
> working alone, such repository tools are necessary. Examples:
>
> 1) You start out making changes based on a new source. Corrected
> dates, new child, etc. Only to find out that this is the wrong family.
> You want to revert the changes you made last Tuesday between 11am and
> 11:30am.
>
> 2) A specific user changed a person's birthdate, and you don't feel
> confident about it, so you want to ask about it. If you didn't have
> the diff like above, you would not be aware of the specific change.
> (We could generate a diff when we decide to sync or not, but then it
> is gone).
>
> 3) A database gets corrupted. You can restore from back up, and apply
> all of the changes using the Log file since then to get back to where
> you were exactly.
>
> Even if we do add a more sophisticated Log file, I can see that some
> people would not want to use it, so it could be optional, and Gramps
> would work just as it is. It could take up some space, both on disk
> and in the XML.
>
> -Doug
>
>> Benny
>>
>>
>> 2012/6/19 Doug Blank<[hidden email]>
>>> On Mon, Jun 18, 2012 at 11:27 PM, Rob Healey<[hidden email]>  wrote:
>>>> Greetings:
>>>>
>>>> As complex as this is, I do not have much to say!  The only comment that
>>>> I
>>>> have is:
>>>>
>>>> I would like to see the changes, edits, modifications, etc to the
>>>> database
>>>> to be in a git repository...
>>>>
>>>> But then we must think of the normal user too, are they going to be able
>>>> to
>>>> use our software if we add an increased level of technology to it?
>>>>
>>>> Will they will willing to use it if they do NOT understand about a git
>>>> repo...
>>>>
>>>> I was also thinking that for a user to make use of gramps-connect in the
>>>> first place would require a certain level of technology any ways!
>>> Rob,
>>>
>>> I don't think we want to use git... that was just an analogy for
>>> having the same kinds of functionality. For example, any git
>>> repository, fork or original, contains the full history.
>>>
>>> One can use git now, if you wanted to check your XML file in. But that
>>> doesn't help at the object level. For example, if you changed a date
>>> on an Event and you used git (or some other text-based revision
>>> system) you'd have to export the full XML, make the commit which would
>>> compute the diff. In the scenario I am proposing, a Gramps-patch would
>>> know which object was changed, and could just update that single
>>> object.
>>>
>>> I wouldn't want to use a system that is too complex. I imagine that
>>> there will only be a couple of options: "Update from master" (aka, svn
>>> update), and "Commit Changes" (aka, svn commit).
>>>
>>> -Doug
>>>
>>>> Sincerely yours,
>>>> Rob G. Healey
>>>>
>>>>
>>>> On Mon, Jun 18, 2012 at 7:26 AM, Doug Blank<[hidden email]>
>>>> wrote:
>>>>> Devs,
>>>>>
>>>>> First, http://Gramps-Connect.org is coming along (and kept up-to-date
>>>>> online):
>>>>>
>>>>> * can browse all data
>>>>> * can edit all of the core data on all of the main objects
>>>>> * can delete all of the main objects (currently just deletes, no
>>>>> warning, no undo)
>>>>> * can edit main parts of names, surnames
>>>>> * can run any report, import, and export from the web (no tools though)
>>>>> * three levels of permissions:
>>>>> ** not logged in: only see non-private, non-living data
>>>>> ** logged in, but not superuser: can see all data, export and run
>>>>> reports
>>>>> ** logged in, superuser: can edit, delete, import data
>>>>> * can edit notes with markup
>>>>> * can add children to families, events to people, etc
>>>>> * can change CSS of site
>>>>> * can change site name
>>>>>
>>>>> This is of course still very much alpha, but, I've put by family tree
>>>>> on line and have started doing simple edits. It tastes like dogfood,
>>>>> but either I'm getting use to it, or it gets a little better every day
>>>>> :)
>>>>>
>>>>> One of the first things that one wants to do is merge the changes made
>>>>> on-line with a master database. We all have made some initial notes
>>>>> here:
>>>>>
>>>>>
>>>>>
>>>>> http://gramps-project.org/wiki/index.php?title=GEPS_009:_Import_Export_Merge
>>>>> http://www.gramps-project.org/bugs/view.php?id=5853
>>>>> http://www.gramps-project.org/bugs/view.php?id=2623
>>>>>
>>>>> Now, I'm seriously thinking about how to do this, perhaps starting
>>>>> with something simple. I'm thinking that there are three different
>>>>> analogies:
>>>>>
>>>>> 1) diff and patch: keep track of all edits, deletes, and additions and
>>>>> create a type of patch file that gets applied to another database.
>>>>>
>>>>> 2) subversion: there is a master database, and all patches are
>>>>> incorporated there. A special re-sync could get sent out to
>>>>> checked-out versions.
>>>>>
>>>>> 3) git: all databases are full repositories, and can be forked, merged.
>>>>>
>>>>> Perhaps starting with #1 is the easiest, and could lead to the others.
>>>>> But even with that option, there appears to be additional data that
>>>>> needs to be kept. For example, say I delete an object in a database;
>>>>> how do I keep track of that, to be able to send it to the other
>>>>> database?
>>>>>
>>>>> At a bare minimum, it seems like we need a persistent representation
>>>>> of:
>>>>>
>>>>> date-time, object-type, handle, change-made, commit-message
>>>>> 2012/6/1 11:00:00, person, 34763478324, deleted, Duplicate person
>>>>> 2012/6/1 12:00:00, person, 23984737847, created, Research on Monday
>>>>> 2012/6/1 12:00:00, source, 38734763786, created, Research on Monday
>>>>> 2012/6/1 12:00:00, citation, 34834767346, created, Research on Monday
>>>>> 2012/6/1 12:01:00, person, 23984737847, edited, Typo on given name
>>>>>
>>>>> (The commit message is not strictly needed, but I am finding it to be
>>>>> quite useful.) From this it seems that a patch-like file (written in
>>>>> xml?) could be made, given a start date-time. Applying the patch may
>>>>> come into conflicts, but that is a separate issue, I think. We could
>>>>> also include more information here in the persistent storage (for
>>>>> example, before and after serializations).
>>>>>
>>>>> So, if this sounds correct, where/how should the data be stored?
>>>>> Perhaps just a text file that we can append onto would be safe and
>>>>> sturdy? Should we reuse the XML representation of the data for the
>>>>> patch? That sounds best, as we already can read/write those. But a
>>>>> json file would be easy, too. (Could just use raw Python
>>>>> serialization, but that could get messy when dealing with database
>>>>> upgrades).
>>>>>
>>>>> Comments, ideas welcomed,
>>>>>
>>>>> -Doug
>>>>>
>>>>>
>>>>>
>>>>> ------------------------------------------------------------------------------
>>>>> Live Security Virtual Conference
>>>>> Exclusive live event will cover all the ways today's security and
>>>>> threat landscape has changed and how IT managers can respond.
>>>>> Discussions
>>>>> will include endpoint security, mobile security and the latest in
>>>>> malware
>>>>> threats. http://www.accelacomm.com/jaw/sfrnl04242012/114/50122263/
>>>>> _______________________________________________
>>>>> Gramps-devel mailing list
>>>>> [hidden email]
>>>>> https://lists.sourceforge.net/lists/listinfo/gramps-devel
>>>>
>>>>
>>>>
>>>> --
>>>> Sincerely yours,
>>>> Rob G. Healey
>>>>
>>>>
>>>
>>> ------------------------------------------------------------------------------
>>> Live Security Virtual Conference
>>> Exclusive live event will cover all the ways today's security and
>>> threat landscape has changed and how IT managers can respond. Discussions
>>> will include endpoint security, mobile security and the latest in malware
>>> threats. http://www.accelacomm.com/jaw/sfrnl04242012/114/50122263/
>>> _______________________________________________
>>> Gramps-devel mailing list
>>> [hidden email]
>>> https://lists.sourceforge.net/lists/listinfo/gramps-devel
>>
> ------------------------------------------------------------------------------
> Live Security Virtual Conference
> Exclusive live event will cover all the ways today's security and
> threat landscape has changed and how IT managers can respond. Discussions
> will include endpoint security, mobile security and the latest in malware
> threats. http://www.accelacomm.com/jaw/sfrnl04242012/114/50122263/
> _______________________________________________
> Gramps-devel mailing list
> [hidden email]
> https://lists.sourceforge.net/lists/listinfo/gramps-devel
>
>


------------------------------------------------------------------------------
Live Security Virtual Conference
Exclusive live event will cover all the ways today's security and
threat landscape has changed and how IT managers can respond. Discussions
will include endpoint security, mobile security and the latest in malware
threats. http://www.accelacomm.com/jaw/sfrnl04242012/114/50122263/
_______________________________________________
Gramps-devel mailing list
[hidden email]
https://lists.sourceforge.net/lists/listinfo/gramps-devel
Reply | Threaded
Open this post in threaded view
|

Re: Towards database synchonizing

DS Blank
On Fri, Jun 22, 2012 at 10:51 AM, Nick Hall <[hidden email]> wrote:

> Doug,
>
> I don't have much to add, but this will be very useful functionality.
>
> Your subversion analogy is good.  To sync two databases you need to find
> the "diff" between them.  The apply the "patch" after some user input to
> resolve "conflicts".
>
> Technically we could open two databases at the same time.  Perhaps we
> could then add some "diff" functionality into gen.lib?
>
> Suppose I have a database that I export for someone else to work on.
> They make some changes and send me back a modified xml.  What I want to
> do is see the changes that they have made.
>
> I would like to open my original database, and then import their xml
> into another database (both open at the same time).  Then it would be
> nice to see their changes in a "graphical diff" view.  Finally I could
> decide how to sync the changes.  Could we use a Dictionary Database for
> the imported xml?
>
> In this case we could use Gramps handles to match objects, but would it
> be useful to add a UUID at this point?
>
> I can also see why a log file was suggested to keep track of changes
> rather than doing a "diff".  At the moment the Gramps log files don't
> store detailed enough information.  This is certainly worth
> consideration though.
>
> Regards,
>
> Nick.
>

Thanks for the comments, all. I've just added
trunk/src/gen/merge/diff.py which is the beginning of tools to compute
differences between dbs and items. It does use the DictionaryDb (in
its current state, but Gerald is looking at it to make it better.
Gerald, I fixed a few minor issues there). That does load everything
into memory, which may not be viable... we may need to just make a
temp db somewhere on disk eventually.

The diff_dbs(db1, db2) code is similar to a merge_sort algorithm: it
sorts the handles and goes through all of the items just once, making
it O(n1 + n2) creating a list of diffs and missing objects in each db.

To make this easier, I've added Object.to_json() methods to all Gramps
objects. Json is used as a generic name for a representation made up
of dicts, lists, and values. A Json representation (or something like
it) makes it easier to do programmatic comparisons between objects.
I've added a patch which adds the to_json methods to [1], and will
keep notes there.

After applying the patch, one can test some basic functionality, in
the Python Gramplet, one can say:

>>> from gen.merge.diff import *
>>> diffs, missing1, missing2 = diff_db_to_file(db, "/path/to/data.gramps")

That will load the "/path/to/data.gramps" into a DictionaryDb and then
do the comparison, print out differences to stdout, and return the
diffs, and the missing objects (missing in old, and missing in new).

Next, I'll begin to allow a user to select what should be
added/deleted/edited, perhaps being able to use the Merge functions.
Also, will explore the datetime-based ideas, and a log of deleted
items. And UUID.

But first, I wanted to get feedback on the idea of adding the
Object.to_json() method to every object.

-Doug

[1] - http://www.gramps-project.org/bugs/view.php?id=2623

> On 19/06/12 13:44, Doug Blank wrote:
>> On Tue, Jun 19, 2012 at 3:52 AM, Benny Malengier
>> <[hidden email]>  wrote:
>>> Deleted objects could be just a table with
>>> handle, object_type, date_delete.
>> Ok, I think that you are correct that the only persistent additional
>> data required is this list of deleted objects. Let's call this a Log
>> file. This Log is import to carry out the syncing. I guess it should
>> be exported with the Gramps XML as well.
>>
>>> Sync would be different from import. With sync, I understand that we can
>>> assume database handle values to be identical. This simplifies from merge
>>> where we don't have equal handles!
>>>
>>> Sync has a target database that needs updating, and an origin database from
>>> which we update.
>>> There could be a date: sync changes since _date_
>>>
>>> All objects with a change date after _date_ have been updated (or created),
>>> deleted objects table has the objects which are deleted.
>>>
>>> Sync is then :
>>> 1. backup revision of the family tree before sync
>>> 2. new objects =>  mark for insert
>>> 3. deleted objects, no change locally after delete date =>  mark for deletion
>>> 4. deleted objects, change locally =>  mark for user confirm for deletion
>>> 5. updated objects =>  do a diff on differences, mark origin values as new
>>> data
>>> 6. give overview to user on what will happen, ask for confirmation
>>> 7. do the sync
>>> 8. Tell user to do the inverse sync on the origin database so as to sync
>>> your local changes.
>>>
>>> If we don't have a sync date, we need to everytime diff all objects.
>>> Note that for the certificate manager we would also need a diff structure on
>>> objects, see that mail discussion.
>>> Note that you might want to sync 4 different databases (two local computers,
>>> one cloud, one mobile), so _date_ above is not unique but is just 'user
>>> input' to make sync go faster (no compare of all objects).
>> I think that this is all correct. However, I think I would like one
>> additional related feature: I want to be able to look over either the
>> proposed changes in detail or the changes once they have been made.
>>
>> I imagine something like the svn revision comparisons. I want to see
>> the commit message, and I want to see a diff between what it was and
>> what it became. Like:
>>
>> 2012/6/2 dsblank changed Person I0045:
>>    Reason: Was a typo based on S0045
>>    Before:
>>       Given name: Tomas
>>    After:
>>       Given name: Thomas
>> 2012/6/3 shelper deleted Event E0005:
>>    Reason: Duplicated with Event E0006
>>
>> I think that this functionality is more important when collaborating,
>> because one has no idea why these changes are being made. Imagine
>> doing collaborative code development without such tools. But even when
>> working alone, such repository tools are necessary. Examples:
>>
>> 1) You start out making changes based on a new source. Corrected
>> dates, new child, etc. Only to find out that this is the wrong family.
>> You want to revert the changes you made last Tuesday between 11am and
>> 11:30am.
>>
>> 2) A specific user changed a person's birthdate, and you don't feel
>> confident about it, so you want to ask about it. If you didn't have
>> the diff like above, you would not be aware of the specific change.
>> (We could generate a diff when we decide to sync or not, but then it
>> is gone).
>>
>> 3) A database gets corrupted. You can restore from back up, and apply
>> all of the changes using the Log file since then to get back to where
>> you were exactly.
>>
>> Even if we do add a more sophisticated Log file, I can see that some
>> people would not want to use it, so it could be optional, and Gramps
>> would work just as it is. It could take up some space, both on disk
>> and in the XML.
>>
>> -Doug
>>
>>> Benny
>>>
>>>
>>> 2012/6/19 Doug Blank<[hidden email]>
>>>> On Mon, Jun 18, 2012 at 11:27 PM, Rob Healey<[hidden email]>  wrote:
>>>>> Greetings:
>>>>>
>>>>> As complex as this is, I do not have much to say!  The only comment that
>>>>> I
>>>>> have is:
>>>>>
>>>>> I would like to see the changes, edits, modifications, etc to the
>>>>> database
>>>>> to be in a git repository...
>>>>>
>>>>> But then we must think of the normal user too, are they going to be able
>>>>> to
>>>>> use our software if we add an increased level of technology to it?
>>>>>
>>>>> Will they will willing to use it if they do NOT understand about a git
>>>>> repo...
>>>>>
>>>>> I was also thinking that for a user to make use of gramps-connect in the
>>>>> first place would require a certain level of technology any ways!
>>>> Rob,
>>>>
>>>> I don't think we want to use git... that was just an analogy for
>>>> having the same kinds of functionality. For example, any git
>>>> repository, fork or original, contains the full history.
>>>>
>>>> One can use git now, if you wanted to check your XML file in. But that
>>>> doesn't help at the object level. For example, if you changed a date
>>>> on an Event and you used git (or some other text-based revision
>>>> system) you'd have to export the full XML, make the commit which would
>>>> compute the diff. In the scenario I am proposing, a Gramps-patch would
>>>> know which object was changed, and could just update that single
>>>> object.
>>>>
>>>> I wouldn't want to use a system that is too complex. I imagine that
>>>> there will only be a couple of options: "Update from master" (aka, svn
>>>> update), and "Commit Changes" (aka, svn commit).
>>>>
>>>> -Doug
>>>>
>>>>> Sincerely yours,
>>>>> Rob G. Healey
>>>>>
>>>>>
>>>>> On Mon, Jun 18, 2012 at 7:26 AM, Doug Blank<[hidden email]>
>>>>> wrote:
>>>>>> Devs,
>>>>>>
>>>>>> First, http://Gramps-Connect.org is coming along (and kept up-to-date
>>>>>> online):
>>>>>>
>>>>>> * can browse all data
>>>>>> * can edit all of the core data on all of the main objects
>>>>>> * can delete all of the main objects (currently just deletes, no
>>>>>> warning, no undo)
>>>>>> * can edit main parts of names, surnames
>>>>>> * can run any report, import, and export from the web (no tools though)
>>>>>> * three levels of permissions:
>>>>>> ** not logged in: only see non-private, non-living data
>>>>>> ** logged in, but not superuser: can see all data, export and run
>>>>>> reports
>>>>>> ** logged in, superuser: can edit, delete, import data
>>>>>> * can edit notes with markup
>>>>>> * can add children to families, events to people, etc
>>>>>> * can change CSS of site
>>>>>> * can change site name
>>>>>>
>>>>>> This is of course still very much alpha, but, I've put by family tree
>>>>>> on line and have started doing simple edits. It tastes like dogfood,
>>>>>> but either I'm getting use to it, or it gets a little better every day
>>>>>> :)
>>>>>>
>>>>>> One of the first things that one wants to do is merge the changes made
>>>>>> on-line with a master database. We all have made some initial notes
>>>>>> here:
>>>>>>
>>>>>>
>>>>>>
>>>>>> http://gramps-project.org/wiki/index.php?title=GEPS_009:_Import_Export_Merge
>>>>>> http://www.gramps-project.org/bugs/view.php?id=5853
>>>>>> http://www.gramps-project.org/bugs/view.php?id=2623
>>>>>>
>>>>>> Now, I'm seriously thinking about how to do this, perhaps starting
>>>>>> with something simple. I'm thinking that there are three different
>>>>>> analogies:
>>>>>>
>>>>>> 1) diff and patch: keep track of all edits, deletes, and additions and
>>>>>> create a type of patch file that gets applied to another database.
>>>>>>
>>>>>> 2) subversion: there is a master database, and all patches are
>>>>>> incorporated there. A special re-sync could get sent out to
>>>>>> checked-out versions.
>>>>>>
>>>>>> 3) git: all databases are full repositories, and can be forked, merged.
>>>>>>
>>>>>> Perhaps starting with #1 is the easiest, and could lead to the others.
>>>>>> But even with that option, there appears to be additional data that
>>>>>> needs to be kept. For example, say I delete an object in a database;
>>>>>> how do I keep track of that, to be able to send it to the other
>>>>>> database?
>>>>>>
>>>>>> At a bare minimum, it seems like we need a persistent representation
>>>>>> of:
>>>>>>
>>>>>> date-time, object-type, handle, change-made, commit-message
>>>>>> 2012/6/1 11:00:00, person, 34763478324, deleted, Duplicate person
>>>>>> 2012/6/1 12:00:00, person, 23984737847, created, Research on Monday
>>>>>> 2012/6/1 12:00:00, source, 38734763786, created, Research on Monday
>>>>>> 2012/6/1 12:00:00, citation, 34834767346, created, Research on Monday
>>>>>> 2012/6/1 12:01:00, person, 23984737847, edited, Typo on given name
>>>>>>
>>>>>> (The commit message is not strictly needed, but I am finding it to be
>>>>>> quite useful.) From this it seems that a patch-like file (written in
>>>>>> xml?) could be made, given a start date-time. Applying the patch may
>>>>>> come into conflicts, but that is a separate issue, I think. We could
>>>>>> also include more information here in the persistent storage (for
>>>>>> example, before and after serializations).
>>>>>>
>>>>>> So, if this sounds correct, where/how should the data be stored?
>>>>>> Perhaps just a text file that we can append onto would be safe and
>>>>>> sturdy? Should we reuse the XML representation of the data for the
>>>>>> patch? That sounds best, as we already can read/write those. But a
>>>>>> json file would be easy, too. (Could just use raw Python
>>>>>> serialization, but that could get messy when dealing with database
>>>>>> upgrades).
>>>>>>
>>>>>> Comments, ideas welcomed,
>>>>>>
>>>>>> -Doug
>>>>>>
>>>>>>
>>>>>>
>>>>>> ------------------------------------------------------------------------------
>>>>>> Live Security Virtual Conference
>>>>>> Exclusive live event will cover all the ways today's security and
>>>>>> threat landscape has changed and how IT managers can respond.
>>>>>> Discussions
>>>>>> will include endpoint security, mobile security and the latest in
>>>>>> malware
>>>>>> threats. http://www.accelacomm.com/jaw/sfrnl04242012/114/50122263/
>>>>>> _______________________________________________
>>>>>> Gramps-devel mailing list
>>>>>> [hidden email]
>>>>>> https://lists.sourceforge.net/lists/listinfo/gramps-devel
>>>>>
>>>>>
>>>>>
>>>>> --
>>>>> Sincerely yours,
>>>>> Rob G. Healey
>>>>>
>>>>>
>>>>
>>>> ------------------------------------------------------------------------------
>>>> Live Security Virtual Conference
>>>> Exclusive live event will cover all the ways today's security and
>>>> threat landscape has changed and how IT managers can respond. Discussions
>>>> will include endpoint security, mobile security and the latest in malware
>>>> threats. http://www.accelacomm.com/jaw/sfrnl04242012/114/50122263/
>>>> _______________________________________________
>>>> Gramps-devel mailing list
>>>> [hidden email]
>>>> https://lists.sourceforge.net/lists/listinfo/gramps-devel
>>>
>> ------------------------------------------------------------------------------
>> Live Security Virtual Conference
>> Exclusive live event will cover all the ways today's security and
>> threat landscape has changed and how IT managers can respond. Discussions
>> will include endpoint security, mobile security and the latest in malware
>> threats. http://www.accelacomm.com/jaw/sfrnl04242012/114/50122263/
>> _______________________________________________
>> Gramps-devel mailing list
>> [hidden email]
>> https://lists.sourceforge.net/lists/listinfo/gramps-devel
>>
>>
>
>
> ------------------------------------------------------------------------------
> Live Security Virtual Conference
> Exclusive live event will cover all the ways today's security and
> threat landscape has changed and how IT managers can respond. Discussions
> will include endpoint security, mobile security and the latest in malware
> threats. http://www.accelacomm.com/jaw/sfrnl04242012/114/50122263/
> _______________________________________________
> Gramps-devel mailing list
> [hidden email]
> https://lists.sourceforge.net/lists/listinfo/gramps-devel

------------------------------------------------------------------------------
Live Security Virtual Conference
Exclusive live event will cover all the ways today's security and
threat landscape has changed and how IT managers can respond. Discussions
will include endpoint security, mobile security and the latest in malware
threats. http://www.accelacomm.com/jaw/sfrnl04242012/114/50122263/
_______________________________________________
Gramps-devel mailing list
[hidden email]
https://lists.sourceforge.net/lists/listinfo/gramps-devel
Reply | Threaded
Open this post in threaded view
|

Re: Towards database synchonizing

jerome
> That does load everything into memory, which may not be viable...
[snip]
> That will load the "/path/to/data.gramps" into a DictionaryDb and then
> do the comparison, print out differences to stdout, and return the
> diffs, and the missing objects (missing in old, and missing in new).

Why not only loading Gramps XML into memory then to make a comparison
with database (state, read only)?

Like having 'self.dbstate.db.db_is_open' with something like :

try:
     # lxml
     from lxml import etree as etree_
     XMLParser_import_library = XMLParser_import_lxml
     if Verbose_import_:
         print("running with lxml.etree")
except ImportError:
     try:
         # cElementTree from Python 2.5+
         import xml.etree.cElementTree as etree_
         XMLParser_import_library = XMLParser_import_elementtree
         if Verbose_import_:
             print("running with cElementTree on Python 2.5+")
     except ImportError:
         try:
             # ElementTree from Python 2.5+
             import xml.etree.ElementTree as etree_
             XMLParser_import_library = XMLParser_import_elementtree
             if Verbose_import_:
                 print("running with ElementTree on Python 2.5+")
         except ImportError:
             try:
                 # normal cElementTree install
                 import cElementTree as etree_
                 XMLParser_import_library = XMLParser_import_elementtree
                 if Verbose_import_:
                     print("running with cElementTree")
             except ImportError:
                 try:
                     # normal ElementTree install
                     import elementtree.ElementTree as etree_
                     XMLParser_import_library = XMLParser_import_elementtree
                     if Verbose_import_:
                         print("running with ElementTree")
                 except ImportError:
                     raise ImportError("Failed to import ElementTree
from any known place")

Source: http://cutter.rexx.com/~dkuhlman/
See also: http://lxml.de/performance.html

Gramps 1.0.x has loaded Gramps XML in memory !
It was limited with previous XML wrappers/handlers provided with python
and memory usage.

Now, a simple sample which should work on all python versions after
loading a Gramps XML in memory:

# find() needs memory - /!\ large files
if one.find(NAMESPACE + 'event'):
     print('XML: Find all "event" records: %s' %
           len(one.findall(NAMESPACE + 'event')))

This is very easy to see what we are doing (one as first level!), like
filtering with benefit of markup languages.

This way was also used for an experimental 'Database compare and merge',
see: http://sourceforge.net/mailarchive/message.php?msg_id=28173190
http://gramps.1791082.n4.nabble.com/file/n3866064/GrampsCompareV02.py

I have no idea how to translate this into current DB logic, but this
sounds closer to SQL and hierarchical/level/factory logic.


Jérôme



Doug Blank a écrit :

> On Fri, Jun 22, 2012 at 10:51 AM, Nick Hall <[hidden email]> wrote:
>> Doug,
>>
>> I don't have much to add, but this will be very useful functionality.
>>
>> Your subversion analogy is good.  To sync two databases you need to find
>> the "diff" between them.  The apply the "patch" after some user input to
>> resolve "conflicts".
>>
>> Technically we could open two databases at the same time.  Perhaps we
>> could then add some "diff" functionality into gen.lib?
>>
>> Suppose I have a database that I export for someone else to work on.
>> They make some changes and send me back a modified xml.  What I want to
>> do is see the changes that they have made.
>>
>> I would like to open my original database, and then import their xml
>> into another database (both open at the same time).  Then it would be
>> nice to see their changes in a "graphical diff" view.  Finally I could
>> decide how to sync the changes.  Could we use a Dictionary Database for
>> the imported xml?
>>
>> In this case we could use Gramps handles to match objects, but would it
>> be useful to add a UUID at this point?
>>
>> I can also see why a log file was suggested to keep track of changes
>> rather than doing a "diff".  At the moment the Gramps log files don't
>> store detailed enough information.  This is certainly worth
>> consideration though.
>>
>> Regards,
>>
>> Nick.
>>
>
> Thanks for the comments, all. I've just added
> trunk/src/gen/merge/diff.py which is the beginning of tools to compute
> differences between dbs and items. It does use the DictionaryDb (in
> its current state, but Gerald is looking at it to make it better.
> Gerald, I fixed a few minor issues there). That does load everything
> into memory, which may not be viable... we may need to just make a
> temp db somewhere on disk eventually.
>
> The diff_dbs(db1, db2) code is similar to a merge_sort algorithm: it
> sorts the handles and goes through all of the items just once, making
> it O(n1 + n2) creating a list of diffs and missing objects in each db.
>
> To make this easier, I've added Object.to_json() methods to all Gramps
> objects. Json is used as a generic name for a representation made up
> of dicts, lists, and values. A Json representation (or something like
> it) makes it easier to do programmatic comparisons between objects.
> I've added a patch which adds the to_json methods to [1], and will
> keep notes there.
>
> After applying the patch, one can test some basic functionality, in
> the Python Gramplet, one can say:
>
>>>> from gen.merge.diff import *
>>>> diffs, missing1, missing2 = diff_db_to_file(db, "/path/to/data.gramps")
>
> That will load the "/path/to/data.gramps" into a DictionaryDb and then
> do the comparison, print out differences to stdout, and return the
> diffs, and the missing objects (missing in old, and missing in new).
>
> Next, I'll begin to allow a user to select what should be
> added/deleted/edited, perhaps being able to use the Merge functions.
> Also, will explore the datetime-based ideas, and a log of deleted
> items. And UUID.
>
> But first, I wanted to get feedback on the idea of adding the
> Object.to_json() method to every object.
>
> -Doug
>
> [1] - http://www.gramps-project.org/bugs/view.php?id=2623
>
>> On 19/06/12 13:44, Doug Blank wrote:
>>> On Tue, Jun 19, 2012 at 3:52 AM, Benny Malengier
>>> <[hidden email]>  wrote:
>>>> Deleted objects could be just a table with
>>>> handle, object_type, date_delete.
>>> Ok, I think that you are correct that the only persistent additional
>>> data required is this list of deleted objects. Let's call this a Log
>>> file. This Log is import to carry out the syncing. I guess it should
>>> be exported with the Gramps XML as well.
>>>
>>>> Sync would be different from import. With sync, I understand that we can
>>>> assume database handle values to be identical. This simplifies from merge
>>>> where we don't have equal handles!
>>>>
>>>> Sync has a target database that needs updating, and an origin database from
>>>> which we update.
>>>> There could be a date: sync changes since _date_
>>>>
>>>> All objects with a change date after _date_ have been updated (or created),
>>>> deleted objects table has the objects which are deleted.
>>>>
>>>> Sync is then :
>>>> 1. backup revision of the family tree before sync
>>>> 2. new objects =>  mark for insert
>>>> 3. deleted objects, no change locally after delete date =>  mark for deletion
>>>> 4. deleted objects, change locally =>  mark for user confirm for deletion
>>>> 5. updated objects =>  do a diff on differences, mark origin values as new
>>>> data
>>>> 6. give overview to user on what will happen, ask for confirmation
>>>> 7. do the sync
>>>> 8. Tell user to do the inverse sync on the origin database so as to sync
>>>> your local changes.
>>>>
>>>> If we don't have a sync date, we need to everytime diff all objects.
>>>> Note that for the certificate manager we would also need a diff structure on
>>>> objects, see that mail discussion.
>>>> Note that you might want to sync 4 different databases (two local computers,
>>>> one cloud, one mobile), so _date_ above is not unique but is just 'user
>>>> input' to make sync go faster (no compare of all objects).
>>> I think that this is all correct. However, I think I would like one
>>> additional related feature: I want to be able to look over either the
>>> proposed changes in detail or the changes once they have been made.
>>>
>>> I imagine something like the svn revision comparisons. I want to see
>>> the commit message, and I want to see a diff between what it was and
>>> what it became. Like:
>>>
>>> 2012/6/2 dsblank changed Person I0045:
>>>    Reason: Was a typo based on S0045
>>>    Before:
>>>       Given name: Tomas
>>>    After:
>>>       Given name: Thomas
>>> 2012/6/3 shelper deleted Event E0005:
>>>    Reason: Duplicated with Event E0006
>>>
>>> I think that this functionality is more important when collaborating,
>>> because one has no idea why these changes are being made. Imagine
>>> doing collaborative code development without such tools. But even when
>>> working alone, such repository tools are necessary. Examples:
>>>
>>> 1) You start out making changes based on a new source. Corrected
>>> dates, new child, etc. Only to find out that this is the wrong family.
>>> You want to revert the changes you made last Tuesday between 11am and
>>> 11:30am.
>>>
>>> 2) A specific user changed a person's birthdate, and you don't feel
>>> confident about it, so you want to ask about it. If you didn't have
>>> the diff like above, you would not be aware of the specific change.
>>> (We could generate a diff when we decide to sync or not, but then it
>>> is gone).
>>>
>>> 3) A database gets corrupted. You can restore from back up, and apply
>>> all of the changes using the Log file since then to get back to where
>>> you were exactly.
>>>
>>> Even if we do add a more sophisticated Log file, I can see that some
>>> people would not want to use it, so it could be optional, and Gramps
>>> would work just as it is. It could take up some space, both on disk
>>> and in the XML.
>>>
>>> -Doug
>>>
>>>> Benny
>>>>
>>>>
>>>> 2012/6/19 Doug Blank<[hidden email]>
>>>>> On Mon, Jun 18, 2012 at 11:27 PM, Rob Healey<[hidden email]>  wrote:
>>>>>> Greetings:
>>>>>>
>>>>>> As complex as this is, I do not have much to say!  The only comment that
>>>>>> I
>>>>>> have is:
>>>>>>
>>>>>> I would like to see the changes, edits, modifications, etc to the
>>>>>> database
>>>>>> to be in a git repository...
>>>>>>
>>>>>> But then we must think of the normal user too, are they going to be able
>>>>>> to
>>>>>> use our software if we add an increased level of technology to it?
>>>>>>
>>>>>> Will they will willing to use it if they do NOT understand about a git
>>>>>> repo...
>>>>>>
>>>>>> I was also thinking that for a user to make use of gramps-connect in the
>>>>>> first place would require a certain level of technology any ways!
>>>>> Rob,
>>>>>
>>>>> I don't think we want to use git... that was just an analogy for
>>>>> having the same kinds of functionality. For example, any git
>>>>> repository, fork or original, contains the full history.
>>>>>
>>>>> One can use git now, if you wanted to check your XML file in. But that
>>>>> doesn't help at the object level. For example, if you changed a date
>>>>> on an Event and you used git (or some other text-based revision
>>>>> system) you'd have to export the full XML, make the commit which would
>>>>> compute the diff. In the scenario I am proposing, a Gramps-patch would
>>>>> know which object was changed, and could just update that single
>>>>> object.
>>>>>
>>>>> I wouldn't want to use a system that is too complex. I imagine that
>>>>> there will only be a couple of options: "Update from master" (aka, svn
>>>>> update), and "Commit Changes" (aka, svn commit).
>>>>>
>>>>> -Doug
>>>>>
>>>>>> Sincerely yours,
>>>>>> Rob G. Healey
>>>>>>
>>>>>>
>>>>>> On Mon, Jun 18, 2012 at 7:26 AM, Doug Blank<[hidden email]>
>>>>>> wrote:
>>>>>>> Devs,
>>>>>>>
>>>>>>> First, http://Gramps-Connect.org is coming along (and kept up-to-date
>>>>>>> online):
>>>>>>>
>>>>>>> * can browse all data
>>>>>>> * can edit all of the core data on all of the main objects
>>>>>>> * can delete all of the main objects (currently just deletes, no
>>>>>>> warning, no undo)
>>>>>>> * can edit main parts of names, surnames
>>>>>>> * can run any report, import, and export from the web (no tools though)
>>>>>>> * three levels of permissions:
>>>>>>> ** not logged in: only see non-private, non-living data
>>>>>>> ** logged in, but not superuser: can see all data, export and run
>>>>>>> reports
>>>>>>> ** logged in, superuser: can edit, delete, import data
>>>>>>> * can edit notes with markup
>>>>>>> * can add children to families, events to people, etc
>>>>>>> * can change CSS of site
>>>>>>> * can change site name
>>>>>>>
>>>>>>> This is of course still very much alpha, but, I've put by family tree
>>>>>>> on line and have started doing simple edits. It tastes like dogfood,
>>>>>>> but either I'm getting use to it, or it gets a little better every day
>>>>>>> :)
>>>>>>>
>>>>>>> One of the first things that one wants to do is merge the changes made
>>>>>>> on-line with a master database. We all have made some initial notes
>>>>>>> here:
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>> http://gramps-project.org/wiki/index.php?title=GEPS_009:_Import_Export_Merge
>>>>>>> http://www.gramps-project.org/bugs/view.php?id=5853
>>>>>>> http://www.gramps-project.org/bugs/view.php?id=2623
>>>>>>>
>>>>>>> Now, I'm seriously thinking about how to do this, perhaps starting
>>>>>>> with something simple. I'm thinking that there are three different
>>>>>>> analogies:
>>>>>>>
>>>>>>> 1) diff and patch: keep track of all edits, deletes, and additions and
>>>>>>> create a type of patch file that gets applied to another database.
>>>>>>>
>>>>>>> 2) subversion: there is a master database, and all patches are
>>>>>>> incorporated there. A special re-sync could get sent out to
>>>>>>> checked-out versions.
>>>>>>>
>>>>>>> 3) git: all databases are full repositories, and can be forked, merged.
>>>>>>>
>>>>>>> Perhaps starting with #1 is the easiest, and could lead to the others.
>>>>>>> But even with that option, there appears to be additional data that
>>>>>>> needs to be kept. For example, say I delete an object in a database;
>>>>>>> how do I keep track of that, to be able to send it to the other
>>>>>>> database?
>>>>>>>
>>>>>>> At a bare minimum, it seems like we need a persistent representation
>>>>>>> of:
>>>>>>>
>>>>>>> date-time, object-type, handle, change-made, commit-message
>>>>>>> 2012/6/1 11:00:00, person, 34763478324, deleted, Duplicate person
>>>>>>> 2012/6/1 12:00:00, person, 23984737847, created, Research on Monday
>>>>>>> 2012/6/1 12:00:00, source, 38734763786, created, Research on Monday
>>>>>>> 2012/6/1 12:00:00, citation, 34834767346, created, Research on Monday
>>>>>>> 2012/6/1 12:01:00, person, 23984737847, edited, Typo on given name
>>>>>>>
>>>>>>> (The commit message is not strictly needed, but I am finding it to be
>>>>>>> quite useful.) From this it seems that a patch-like file (written in
>>>>>>> xml?) could be made, given a start date-time. Applying the patch may
>>>>>>> come into conflicts, but that is a separate issue, I think. We could
>>>>>>> also include more information here in the persistent storage (for
>>>>>>> example, before and after serializations).
>>>>>>>
>>>>>>> So, if this sounds correct, where/how should the data be stored?
>>>>>>> Perhaps just a text file that we can append onto would be safe and
>>>>>>> sturdy? Should we reuse the XML representation of the data for the
>>>>>>> patch? That sounds best, as we already can read/write those. But a
>>>>>>> json file would be easy, too. (Could just use raw Python
>>>>>>> serialization, but that could get messy when dealing with database
>>>>>>> upgrades).
>>>>>>>
>>>>>>> Comments, ideas welcomed,
>>>>>>>
>>>>>>> -Doug
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>> ------------------------------------------------------------------------------
>>>>>>> Live Security Virtual Conference
>>>>>>> Exclusive live event will cover all the ways today's security and
>>>>>>> threat landscape has changed and how IT managers can respond.
>>>>>>> Discussions
>>>>>>> will include endpoint security, mobile security and the latest in
>>>>>>> malware
>>>>>>> threats. http://www.accelacomm.com/jaw/sfrnl04242012/114/50122263/
>>>>>>> _______________________________________________
>>>>>>> Gramps-devel mailing list
>>>>>>> [hidden email]
>>>>>>> https://lists.sourceforge.net/lists/listinfo/gramps-devel
>>>>>>
>>>>>>
>>>>>> --
>>>>>> Sincerely yours,
>>>>>> Rob G. Healey
>>>>>>
>>>>>>
>>>>> ------------------------------------------------------------------------------
>>>>> Live Security Virtual Conference
>>>>> Exclusive live event will cover all the ways today's security and
>>>>> threat landscape has changed and how IT managers can respond. Discussions
>>>>> will include endpoint security, mobile security and the latest in malware
>>>>> threats. http://www.accelacomm.com/jaw/sfrnl04242012/114/50122263/
>>>>> _______________________________________________
>>>>> Gramps-devel mailing list
>>>>> [hidden email]
>>>>> https://lists.sourceforge.net/lists/listinfo/gramps-devel
>>> ------------------------------------------------------------------------------
>>> Live Security Virtual Conference
>>> Exclusive live event will cover all the ways today's security and
>>> threat landscape has changed and how IT managers can respond. Discussions
>>> will include endpoint security, mobile security and the latest in malware
>>> threats. http://www.accelacomm.com/jaw/sfrnl04242012/114/50122263/
>>> _______________________________________________
>>> Gramps-devel mailing list
>>> [hidden email]
>>> https://lists.sourceforge.net/lists/listinfo/gramps-devel
>>>
>>>
>>
>> ------------------------------------------------------------------------------
>> Live Security Virtual Conference
>> Exclusive live event will cover all the ways today's security and
>> threat landscape has changed and how IT managers can respond. Discussions
>> will include endpoint security, mobile security and the latest in malware
>> threats. http://www.accelacomm.com/jaw/sfrnl04242012/114/50122263/
>> _______________________________________________
>> Gramps-devel mailing list
>> [hidden email]
>> https://lists.sourceforge.net/lists/listinfo/gramps-devel
>
> ------------------------------------------------------------------------------
> Live Security Virtual Conference
> Exclusive live event will cover all the ways today's security and
> threat landscape has changed and how IT managers can respond. Discussions
> will include endpoint security, mobile security and the latest in malware
> threats. http://www.accelacomm.com/jaw/sfrnl04242012/114/50122263/
> _______________________________________________
> Gramps-devel mailing list
> [hidden email]
> https://lists.sourceforge.net/lists/listinfo/gramps-devel
>


------------------------------------------------------------------------------
Live Security Virtual Conference
Exclusive live event will cover all the ways today's security and
threat landscape has changed and how IT managers can respond. Discussions
will include endpoint security, mobile security and the latest in malware
threats. http://www.accelacomm.com/jaw/sfrnl04242012/114/50122263/
_______________________________________________
Gramps-devel mailing list
[hidden email]
https://lists.sourceforge.net/lists/listinfo/gramps-devel
Reply | Threaded
Open this post in threaded view
|

Re: Towards database synchonizing

DS Blank
On Thu, Jul 12, 2012 at 3:30 AM, Jérôme <[hidden email]> wrote:

>> That does load everything into memory, which may not be viable...
>
> [snip]
>
>> That will load the "/path/to/data.gramps" into a DictionaryDb and then
>> do the comparison, print out differences to stdout, and return the
>> diffs, and the missing objects (missing in old, and missing in new).
>
>
> Why not only loading Gramps XML into memory then to make a comparison with
> database (state, read only)?

I did consider XML as a representation for comparison for each object,
but it is easier to use something closer to Python. The json-like
representation I'm using makes it easy to find where two objects
differ. It also could be used to generate XML for each object, which
would be useful in other ways. Currently the exporter is the only
place where objects are created in XML. We could move that so that
each object was responsible for that conversion.

-Doug

> Like having 'self.dbstate.db.db_is_open' with something like :
>
> try:
>     # lxml
>     from lxml import etree as etree_
>     XMLParser_import_library = XMLParser_import_lxml
>     if Verbose_import_:
>         print("running with lxml.etree")
> except ImportError:
>     try:
>         # cElementTree from Python 2.5+
>         import xml.etree.cElementTree as etree_
>         XMLParser_import_library = XMLParser_import_elementtree
>         if Verbose_import_:
>             print("running with cElementTree on Python 2.5+")
>     except ImportError:
>         try:
>             # ElementTree from Python 2.5+
>             import xml.etree.ElementTree as etree_
>             XMLParser_import_library = XMLParser_import_elementtree
>             if Verbose_import_:
>                 print("running with ElementTree on Python 2.5+")
>         except ImportError:
>             try:
>                 # normal cElementTree install
>                 import cElementTree as etree_
>                 XMLParser_import_library = XMLParser_import_elementtree
>                 if Verbose_import_:
>                     print("running with cElementTree")
>             except ImportError:
>                 try:
>                     # normal ElementTree install
>                     import elementtree.ElementTree as etree_
>                     XMLParser_import_library = XMLParser_import_elementtree
>                     if Verbose_import_:
>                         print("running with ElementTree")
>                 except ImportError:
>                     raise ImportError("Failed to import ElementTree from any
> known place")
>
> Source: http://cutter.rexx.com/~dkuhlman/
> See also: http://lxml.de/performance.html
>
> Gramps 1.0.x has loaded Gramps XML in memory !
> It was limited with previous XML wrappers/handlers provided with python and
> memory usage.
>
> Now, a simple sample which should work on all python versions after loading
> a Gramps XML in memory:
>
> # find() needs memory - /!\ large files
> if one.find(NAMESPACE + 'event'):
>     print('XML: Find all "event" records: %s' %
>           len(one.findall(NAMESPACE + 'event')))
>
> This is very easy to see what we are doing (one as first level!), like
> filtering with benefit of markup languages.
>
> This way was also used for an experimental 'Database compare and merge',
> see: http://sourceforge.net/mailarchive/message.php?msg_id=28173190
> http://gramps.1791082.n4.nabble.com/file/n3866064/GrampsCompareV02.py
>
> I have no idea how to translate this into current DB logic, but this sounds
> closer to SQL and hierarchical/level/factory logic.
>
>
> Jérôme
>
>
>
> Doug Blank a écrit :
>
>> On Fri, Jun 22, 2012 at 10:51 AM, Nick Hall <[hidden email]>
>> wrote:
>>>
>>> Doug,
>>>
>>> I don't have much to add, but this will be very useful functionality.
>>>
>>> Your subversion analogy is good.  To sync two databases you need to find
>>> the "diff" between them.  The apply the "patch" after some user input to
>>> resolve "conflicts".
>>>
>>> Technically we could open two databases at the same time.  Perhaps we
>>> could then add some "diff" functionality into gen.lib?
>>>
>>> Suppose I have a database that I export for someone else to work on.
>>> They make some changes and send me back a modified xml.  What I want to
>>> do is see the changes that they have made.
>>>
>>> I would like to open my original database, and then import their xml
>>> into another database (both open at the same time).  Then it would be
>>> nice to see their changes in a "graphical diff" view.  Finally I could
>>> decide how to sync the changes.  Could we use a Dictionary Database for
>>> the imported xml?
>>>
>>> In this case we could use Gramps handles to match objects, but would it
>>> be useful to add a UUID at this point?
>>>
>>> I can also see why a log file was suggested to keep track of changes
>>> rather than doing a "diff".  At the moment the Gramps log files don't
>>> store detailed enough information.  This is certainly worth
>>> consideration though.
>>>
>>> Regards,
>>>
>>> Nick.
>>>
>>
>> Thanks for the comments, all. I've just added
>> trunk/src/gen/merge/diff.py which is the beginning of tools to compute
>> differences between dbs and items. It does use the DictionaryDb (in
>> its current state, but Gerald is looking at it to make it better.
>> Gerald, I fixed a few minor issues there). That does load everything
>> into memory, which may not be viable... we may need to just make a
>> temp db somewhere on disk eventually.
>>
>> The diff_dbs(db1, db2) code is similar to a merge_sort algorithm: it
>> sorts the handles and goes through all of the items just once, making
>> it O(n1 + n2) creating a list of diffs and missing objects in each db.
>>
>> To make this easier, I've added Object.to_json() methods to all Gramps
>> objects. Json is used as a generic name for a representation made up
>> of dicts, lists, and values. A Json representation (or something like
>> it) makes it easier to do programmatic comparisons between objects.
>> I've added a patch which adds the to_json methods to [1], and will
>> keep notes there.
>>
>> After applying the patch, one can test some basic functionality, in
>> the Python Gramplet, one can say:
>>
>>>>> from gen.merge.diff import *
>>>>> diffs, missing1, missing2 = diff_db_to_file(db, "/path/to/data.gramps")
>>
>>
>> That will load the "/path/to/data.gramps" into a DictionaryDb and then
>> do the comparison, print out differences to stdout, and return the
>> diffs, and the missing objects (missing in old, and missing in new).
>>
>> Next, I'll begin to allow a user to select what should be
>> added/deleted/edited, perhaps being able to use the Merge functions.
>> Also, will explore the datetime-based ideas, and a log of deleted
>> items. And UUID.
>>
>> But first, I wanted to get feedback on the idea of adding the
>> Object.to_json() method to every object.
>>
>> -Doug
>>
>> [1] - http://www.gramps-project.org/bugs/view.php?id=2623
>>
>>> On 19/06/12 13:44, Doug Blank wrote:
>>>>
>>>> On Tue, Jun 19, 2012 at 3:52 AM, Benny Malengier
>>>> <[hidden email]>  wrote:
>>>>>
>>>>> Deleted objects could be just a table with
>>>>> handle, object_type, date_delete.
>>>>
>>>> Ok, I think that you are correct that the only persistent additional
>>>> data required is this list of deleted objects. Let's call this a Log
>>>> file. This Log is import to carry out the syncing. I guess it should
>>>> be exported with the Gramps XML as well.
>>>>
>>>>> Sync would be different from import. With sync, I understand that we
>>>>> can
>>>>> assume database handle values to be identical. This simplifies from
>>>>> merge
>>>>> where we don't have equal handles!
>>>>>
>>>>> Sync has a target database that needs updating, and an origin database
>>>>> from
>>>>> which we update.
>>>>> There could be a date: sync changes since _date_
>>>>>
>>>>> All objects with a change date after _date_ have been updated (or
>>>>> created),
>>>>> deleted objects table has the objects which are deleted.
>>>>>
>>>>> Sync is then :
>>>>> 1. backup revision of the family tree before sync
>>>>> 2. new objects =>  mark for insert
>>>>> 3. deleted objects, no change locally after delete date =>  mark for
>>>>> deletion
>>>>> 4. deleted objects, change locally =>  mark for user confirm for
>>>>> deletion
>>>>> 5. updated objects =>  do a diff on differences, mark origin values as
>>>>> new
>>>>> data
>>>>> 6. give overview to user on what will happen, ask for confirmation
>>>>> 7. do the sync
>>>>> 8. Tell user to do the inverse sync on the origin database so as to
>>>>> sync
>>>>> your local changes.
>>>>>
>>>>> If we don't have a sync date, we need to everytime diff all objects.
>>>>> Note that for the certificate manager we would also need a diff
>>>>> structure on
>>>>> objects, see that mail discussion.
>>>>> Note that you might want to sync 4 different databases (two local
>>>>> computers,
>>>>> one cloud, one mobile), so _date_ above is not unique but is just 'user
>>>>> input' to make sync go faster (no compare of all objects).
>>>>
>>>> I think that this is all correct. However, I think I would like one
>>>> additional related feature: I want to be able to look over either the
>>>> proposed changes in detail or the changes once they have been made.
>>>>
>>>> I imagine something like the svn revision comparisons. I want to see
>>>> the commit message, and I want to see a diff between what it was and
>>>> what it became. Like:
>>>>
>>>> 2012/6/2 dsblank changed Person I0045:
>>>>    Reason: Was a typo based on S0045
>>>>    Before:
>>>>       Given name: Tomas
>>>>    After:
>>>>       Given name: Thomas
>>>> 2012/6/3 shelper deleted Event E0005:
>>>>    Reason: Duplicated with Event E0006
>>>>
>>>> I think that this functionality is more important when collaborating,
>>>> because one has no idea why these changes are being made. Imagine
>>>> doing collaborative code development without such tools. But even when
>>>> working alone, such repository tools are necessary. Examples:
>>>>
>>>> 1) You start out making changes based on a new source. Corrected
>>>> dates, new child, etc. Only to find out that this is the wrong family.
>>>> You want to revert the changes you made last Tuesday between 11am and
>>>> 11:30am.
>>>>
>>>> 2) A specific user changed a person's birthdate, and you don't feel
>>>> confident about it, so you want to ask about it. If you didn't have
>>>> the diff like above, you would not be aware of the specific change.
>>>> (We could generate a diff when we decide to sync or not, but then it
>>>> is gone).
>>>>
>>>> 3) A database gets corrupted. You can restore from back up, and apply
>>>> all of the changes using the Log file since then to get back to where
>>>> you were exactly.
>>>>
>>>> Even if we do add a more sophisticated Log file, I can see that some
>>>> people would not want to use it, so it could be optional, and Gramps
>>>> would work just as it is. It could take up some space, both on disk
>>>> and in the XML.
>>>>
>>>> -Doug
>>>>
>>>>> Benny
>>>>>
>>>>>
>>>>> 2012/6/19 Doug Blank<[hidden email]>
>>>>>>
>>>>>> On Mon, Jun 18, 2012 at 11:27 PM, Rob Healey<[hidden email]>
>>>>>> wrote:
>>>>>>>
>>>>>>> Greetings:
>>>>>>>
>>>>>>> As complex as this is, I do not have much to say!  The only comment
>>>>>>> that
>>>>>>> I
>>>>>>> have is:
>>>>>>>
>>>>>>> I would like to see the changes, edits, modifications, etc to the
>>>>>>> database
>>>>>>> to be in a git repository...
>>>>>>>
>>>>>>> But then we must think of the normal user too, are they going to be
>>>>>>> able
>>>>>>> to
>>>>>>> use our software if we add an increased level of technology to it?
>>>>>>>
>>>>>>> Will they will willing to use it if they do NOT understand about a
>>>>>>> git
>>>>>>> repo...
>>>>>>>
>>>>>>> I was also thinking that for a user to make use of gramps-connect in
>>>>>>> the
>>>>>>> first place would require a certain level of technology any ways!
>>>>>>
>>>>>> Rob,
>>>>>>
>>>>>> I don't think we want to use git... that was just an analogy for
>>>>>> having the same kinds of functionality. For example, any git
>>>>>> repository, fork or original, contains the full history.
>>>>>>
>>>>>> One can use git now, if you wanted to check your XML file in. But that
>>>>>> doesn't help at the object level. For example, if you changed a date
>>>>>> on an Event and you used git (or some other text-based revision
>>>>>> system) you'd have to export the full XML, make the commit which would
>>>>>> compute the diff. In the scenario I am proposing, a Gramps-patch would
>>>>>> know which object was changed, and could just update that single
>>>>>> object.
>>>>>>
>>>>>> I wouldn't want to use a system that is too complex. I imagine that
>>>>>> there will only be a couple of options: "Update from master" (aka, svn
>>>>>> update), and "Commit Changes" (aka, svn commit).
>>>>>>
>>>>>> -Doug
>>>>>>
>>>>>>> Sincerely yours,
>>>>>>> Rob G. Healey
>>>>>>>
>>>>>>>
>>>>>>> On Mon, Jun 18, 2012 at 7:26 AM, Doug Blank<[hidden email]>
>>>>>>> wrote:
>>>>>>>>
>>>>>>>> Devs,
>>>>>>>>
>>>>>>>> First, http://Gramps-Connect.org is coming along (and kept
>>>>>>>> up-to-date
>>>>>>>> online):
>>>>>>>>
>>>>>>>> * can browse all data
>>>>>>>> * can edit all of the core data on all of the main objects
>>>>>>>> * can delete all of the main objects (currently just deletes, no
>>>>>>>> warning, no undo)
>>>>>>>> * can edit main parts of names, surnames
>>>>>>>> * can run any report, import, and export from the web (no tools
>>>>>>>> though)
>>>>>>>> * three levels of permissions:
>>>>>>>> ** not logged in: only see non-private, non-living data
>>>>>>>> ** logged in, but not superuser: can see all data, export and run
>>>>>>>> reports
>>>>>>>> ** logged in, superuser: can edit, delete, import data
>>>>>>>> * can edit notes with markup
>>>>>>>> * can add children to families, events to people, etc
>>>>>>>> * can change CSS of site
>>>>>>>> * can change site name
>>>>>>>>
>>>>>>>> This is of course still very much alpha, but, I've put by family
>>>>>>>> tree
>>>>>>>> on line and have started doing simple edits. It tastes like dogfood,
>>>>>>>> but either I'm getting use to it, or it gets a little better every
>>>>>>>> day
>>>>>>>> :)
>>>>>>>>
>>>>>>>> One of the first things that one wants to do is merge the changes
>>>>>>>> made
>>>>>>>> on-line with a master database. We all have made some initial notes
>>>>>>>> here:
>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>> http://gramps-project.org/wiki/index.php?title=GEPS_009:_Import_Export_Merge
>>>>>>>> http://www.gramps-project.org/bugs/view.php?id=5853
>>>>>>>> http://www.gramps-project.org/bugs/view.php?id=2623
>>>>>>>>
>>>>>>>> Now, I'm seriously thinking about how to do this, perhaps starting
>>>>>>>> with something simple. I'm thinking that there are three different
>>>>>>>> analogies:
>>>>>>>>
>>>>>>>> 1) diff and patch: keep track of all edits, deletes, and additions
>>>>>>>> and
>>>>>>>> create a type of patch file that gets applied to another database.
>>>>>>>>
>>>>>>>> 2) subversion: there is a master database, and all patches are
>>>>>>>> incorporated there. A special re-sync could get sent out to
>>>>>>>> checked-out versions.
>>>>>>>>
>>>>>>>> 3) git: all databases are full repositories, and can be forked,
>>>>>>>> merged.
>>>>>>>>
>>>>>>>> Perhaps starting with #1 is the easiest, and could lead to the
>>>>>>>> others.
>>>>>>>> But even with that option, there appears to be additional data that
>>>>>>>> needs to be kept. For example, say I delete an object in a database;
>>>>>>>> how do I keep track of that, to be able to send it to the other
>>>>>>>> database?
>>>>>>>>
>>>>>>>> At a bare minimum, it seems like we need a persistent representation
>>>>>>>> of:
>>>>>>>>
>>>>>>>> date-time, object-type, handle, change-made, commit-message
>>>>>>>> 2012/6/1 11:00:00, person, 34763478324, deleted, Duplicate person
>>>>>>>> 2012/6/1 12:00:00, person, 23984737847, created, Research on Monday
>>>>>>>> 2012/6/1 12:00:00, source, 38734763786, created, Research on Monday
>>>>>>>> 2012/6/1 12:00:00, citation, 34834767346, created, Research on
>>>>>>>> Monday
>>>>>>>> 2012/6/1 12:01:00, person, 23984737847, edited, Typo on given name
>>>>>>>>
>>>>>>>> (The commit message is not strictly needed, but I am finding it to
>>>>>>>> be
>>>>>>>> quite useful.) From this it seems that a patch-like file (written in
>>>>>>>> xml?) could be made, given a start date-time. Applying the patch may
>>>>>>>> come into conflicts, but that is a separate issue, I think. We could
>>>>>>>> also include more information here in the persistent storage (for
>>>>>>>> example, before and after serializations).
>>>>>>>>
>>>>>>>> So, if this sounds correct, where/how should the data be stored?
>>>>>>>> Perhaps just a text file that we can append onto would be safe and
>>>>>>>> sturdy? Should we reuse the XML representation of the data for the
>>>>>>>> patch? That sounds best, as we already can read/write those. But a
>>>>>>>> json file would be easy, too. (Could just use raw Python
>>>>>>>> serialization, but that could get messy when dealing with database
>>>>>>>> upgrades).
>>>>>>>>
>>>>>>>> Comments, ideas welcomed,
>>>>>>>>
>>>>>>>> -Doug
>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>> ------------------------------------------------------------------------------
>>>>>>>> Live Security Virtual Conference
>>>>>>>> Exclusive live event will cover all the ways today's security and
>>>>>>>> threat landscape has changed and how IT managers can respond.
>>>>>>>> Discussions
>>>>>>>> will include endpoint security, mobile security and the latest in
>>>>>>>> malware
>>>>>>>> threats. http://www.accelacomm.com/jaw/sfrnl04242012/114/50122263/
>>>>>>>> _______________________________________________
>>>>>>>> Gramps-devel mailing list
>>>>>>>> [hidden email]
>>>>>>>> https://lists.sourceforge.net/lists/listinfo/gramps-devel
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>> --
>>>>>>> Sincerely yours,
>>>>>>> Rob G. Healey
>>>>>>>
>>>>>>>
>>>>>>
>>>>>> ------------------------------------------------------------------------------
>>>>>> Live Security Virtual Conference
>>>>>> Exclusive live event will cover all the ways today's security and
>>>>>> threat landscape has changed and how IT managers can respond.
>>>>>> Discussions
>>>>>> will include endpoint security, mobile security and the latest in
>>>>>> malware
>>>>>> threats. http://www.accelacomm.com/jaw/sfrnl04242012/114/50122263/
>>>>>> _______________________________________________
>>>>>> Gramps-devel mailing list
>>>>>> [hidden email]
>>>>>> https://lists.sourceforge.net/lists/listinfo/gramps-devel
>>>>
>>>>
>>>> ------------------------------------------------------------------------------
>>>> Live Security Virtual Conference
>>>> Exclusive live event will cover all the ways today's security and
>>>> threat landscape has changed and how IT managers can respond.
>>>> Discussions
>>>> will include endpoint security, mobile security and the latest in
>>>> malware
>>>> threats. http://www.accelacomm.com/jaw/sfrnl04242012/114/50122263/
>>>> _______________________________________________
>>>> Gramps-devel mailing list
>>>> [hidden email]
>>>> https://lists.sourceforge.net/lists/listinfo/gramps-devel
>>>>
>>>>
>>>
>>>
>>> ------------------------------------------------------------------------------
>>> Live Security Virtual Conference
>>> Exclusive live event will cover all the ways today's security and
>>> threat landscape has changed and how IT managers can respond. Discussions
>>> will include endpoint security, mobile security and the latest in malware
>>> threats. http://www.accelacomm.com/jaw/sfrnl04242012/114/50122263/
>>> _______________________________________________
>>> Gramps-devel mailing list
>>> [hidden email]
>>> https://lists.sourceforge.net/lists/listinfo/gramps-devel
>>
>>
>>
>> ------------------------------------------------------------------------------
>> Live Security Virtual Conference
>> Exclusive live event will cover all the ways today's security and threat
>> landscape has changed and how IT managers can respond. Discussions will
>> include endpoint security, mobile security and the latest in malware
>> threats. http://www.accelacomm.com/jaw/sfrnl04242012/114/50122263/
>> _______________________________________________
>> Gramps-devel mailing list
>> [hidden email]
>> https://lists.sourceforge.net/lists/listinfo/gramps-devel
>>
>

------------------------------------------------------------------------------
Live Security Virtual Conference
Exclusive live event will cover all the ways today's security and
threat landscape has changed and how IT managers can respond. Discussions
will include endpoint security, mobile security and the latest in malware
threats. http://www.accelacomm.com/jaw/sfrnl04242012/114/50122263/
_______________________________________________
Gramps-devel mailing list
[hidden email]
https://lists.sourceforge.net/lists/listinfo/gramps-devel