Quantcast

Database compare and merge

classic Classic list List threaded Threaded
18 messages Options
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

Database compare and merge

derHeinzi
Hello Developers,
this is quite a long message to a difficult matter. So bear with me.

Please find attached a python script for comparing data in 2 Gramps xml files.
GrampsCompare.py
The comparison is done in both databases starting with the "same" person, which you have to specify.
For test:
- Create gramps xml-files by unzipping two gramps archives.
- Find IDs of the same "key"-person in both files.
- Start script with parameters firstFile, firstID, secondFile, secondID
- Output is written to screen. You might want to redirect to file.
It is not (yet) a tool to compare entries in 2 different databases but you can already find the changes that have been done to a database you shared with some other person or to a backup you did some time ago.

I'm currently working with Gramps 3.2.5-1 on WinXP and used xml-files from this version for program development and test. But as long as the attributes "id" "handle" and "hlink" are in the xml it should work for other versions as well.

My intention was to find a possible way to a database compare and merge in Gramps. Following the devs and users mailing lists for quite a while now this matter came up from time to time, but I found no concept of solution discussed and no hint that someone is working on this right now.
GrampsConnect was mentioned in some posts, but I did not check if there is a concept or solution there yet.

After hacking a quick database compare script which took several hours to complete a run on my database with about 3000 people, I tweaked the script to now finish in less than a minute. This should be an amount of time that a user could accept for a complicated function to finish?
Since I'm fairly new to Python and using this script as a way to learn the language there might be even better ways to do things. But hey, I'm proud of what I achieved in these few evenings! :-)
I added tons of comments to the code to make you guys understand what the script is doing! So have fun with it. I don't claim any (copy)rights.

Now, what could a database compare and merge look like in the gui and what is still do be done.

First to the GUI of a compare and merge. If you look at the compare and merge window for a person in Gramps you see the person and related info side by side.
This could be changed to a display as shown in attached cmpwin.png.
cmpwin.png
The changes are:
- For every subnode type (tag) in the database there is a "section" in the window. (For person eg. gender, names, ...
- Only subnodes without handles are displayed for comparison.
- All subnodes referring to other nodes with handles are shown in two lists (If a compare by script was performed beforehand the list entrys might show different colors for identical, changed and missing references.) The first list contains the items that "match" items in the other database, the second one shows the items that do not match or could match more than one items in the other database.
- There is a means to "link" nodes from both databases. (The button with the "=" between the lower lists. The broken or unbroken chain symbol on a button would be more appropriate.) If you see the 2 marriage entries in this example on the right side, you have to decide which of these is already in the left database. So you select first the left and then one of the right marriage events to see the data in the quickview. If you find them to refer to the same marriage, you select both and press the "=" button to link these 2 entries. They will be moved to the lists above (matched information).
- If you doubleclick (there could be a button for this) an entry in the matched information list, the content of the window is replaced by the content of the selected node (e.g. the family from the childof reference). There could be a "back" or even a "history" button for navigation like in browsers.
- With the "+" button you add information from the 2nd database, with "<" you replace information.
- The window looks and works the same for all comparisons, no matter if events, persons, families, ... are compared.

What do you think of this GUI concept (not design!, thats far from nice)? Do you think this could be a way to handle data from 2 databases?

Now to the question what still has to be done.
- The standalone script has first to be improved and in the end to be integrated into Gramps.
- The comparison of the data nodes currently does not have a "closer look" at the data. The data itself has to be taken into account. Eg. currently only a check attributesDB1 == attributesDB2 is done. This should ignore the 'change' attribute.
- In the end this function must not rely on same IDs or handles but has to check the data itself. This might get a bit complicated, but I think with the approach that you only have to check the data referred to by already matched nodes it can be solved and handled. Even for comparison with a Gramps xml from an imported GEDCOM.
- It might be useful to generate an output file with the found differences for further evaluation in Gramps or other software?

My excuses for this long post. I hope it was worth reading and clear enough to make you understand what I try to say. My mother tongue is not English which makes it a little difficult to explain complicated things.

Kind regards and have fun
Heinz
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

Re: Database compare and merge

DS Blank
On Thu, Sep 22, 2011 at 5:47 AM, derHeinzi <[hidden email]> wrote:

> Hello Developers,
> this is quite a long message to a difficult matter. So bear with me.
>
> Please find attached a python script for comparing data in 2 Gramps xml
> files.
> http://gramps.1791082.n4.nabble.com/file/n3832887/GrampsCompare.py
> GrampsCompare.py
> The comparison is done in both databases starting with the "same" person,
> which you have to specify.
> For test:
> - Create gramps xml-files by unzipping two gramps archives.
> - Find IDs of the same "key"-person in both files.
> - Start script with parameters firstFile, firstID, secondFile, secondID
> - Output is written to screen. You might want to redirect to file.
> It is not (yet) a tool to compare entries in 2 different databases but you
> can already find the changes that have been done to a database you shared
> with some other person or to a backup you did some time ago.
>
> I'm currently working with Gramps 3.2.5-1 on WinXP and used xml-files from
> this version for program development and test. But as long as the attributes
> "id" "handle" and "hlink" are in the xml it should work for other versions
> as well.
>
> My intention was to find a possible way to a database compare and merge in
> Gramps. Following the devs and users mailing lists for quite a while now
> this matter came up from time to time, but I found no concept of solution
> discussed and no hint that someone is working on this right now.
> GrampsConnect was mentioned in some posts, but I did not check if there is a
> concept or solution there yet.
>
> After hacking a quick database compare script which took several hours to
> complete a run on my database with about 3000 people, I tweaked the script
> to now finish in less than a minute. This should be an amount of time that a
> user could accept for a complicated function to finish?
> Since I'm fairly new to Python and using this script as a way to learn the
> language there might be even better ways to do things. But hey, I'm proud of
> what I achieved in these few evenings! :-)
> I added tons of comments to the code to make you guys understand what the
> script is doing! So have fun with it. I don't claim any (copy)rights.
>
> Now, what could a database compare and merge look like in the gui and what
> is still do be done.
>
> First to the GUI of a compare and merge. If you look at the compare and
> merge window for a person in Gramps you see the person and related info side
> by side.
> This could be changed to a display as shown in attached cmpwin.png.
> http://gramps.1791082.n4.nabble.com/file/n3832887/cmpwin.png cmpwin.png
> The changes are:
> - For every subnode type (tag) in the database there is a "section" in the
> window. (For person eg. gender, names, ...
> - Only subnodes without handles are displayed for comparison.
> - All subnodes referring to other nodes with handles are shown in two lists
> (If a compare by script was performed beforehand the list entrys might show
> different colors for identical, changed and missing references.) The first
> list contains the items that "match" items in the other database, the second
> one shows the items that do not match or could match more than one items in
> the other database.
> - There is a means to "link" nodes from both databases. (The button with the
> "=" between the lower lists. The broken or unbroken chain symbol on a button
> would be more appropriate.) If you see the 2 marriage entries in this
> example on the right side, you have to decide which of these is already in
> the left database. So you select first the left and then one of the right
> marriage events to see the data in the quickview. If you find them to refer
> to the same marriage, you select both and press the "=" button to link these
> 2 entries. They will be moved to the lists above (matched information).
> - If you doubleclick (there could be a button for this) an entry in the
> matched information list, the content of the window is replaced by the
> content of the selected node (e.g. the family from the childof reference).
> There could be a "back" or even a "history" button for navigation like in
> browsers.
> - With the "+" button you add information from the 2nd database, with "<"
> you replace information.
> - The window looks and works the same for all comparisons, no matter if
> events, persons, families, ... are compared.
>
> What do you think of this GUI concept (not design!, thats far from nice)? Do
> you think this could be a way to handle data from 2 databases?
>
> Now to the question what still has to be done.
> - The standalone script has first to be improved and in the end to be
> integrated into Gramps.
> - The comparison of the data nodes currently does not have a "closer look"
> at the data. The data itself has to be taken into account. Eg. currently
> only a check attributesDB1 == attributesDB2 is done. This should ignore the
> 'change' attribute.
> - In the end this function must not rely on same IDs or handles but has to
> check the data itself. This might get a bit complicated, but I think with
> the approach that you only have to check the data referred to by already
> matched nodes it can be solved and handled. Even for comparison with a
> Gramps xml from an imported GEDCOM.
> - It might be useful to generate an output file with the found differences
> for further evaluation in Gramps or other software?
>
> My excuses for this long post. I hope it was worth reading and clear enough
> to make you understand what I try to say. My mother tongue is not English
> which makes it a little difficult to explain complicated things.
>
> Kind regards and have fun
> Heinz

Heinz,

Excellent! This is a great way to get the process going towards a real
solution. Thank you for this investment of time, energy, and ideas.

For the truly lazy (like me), here is a quick way to get started
exploring Heinz's code:

mkdir compare
cd compare/
cp ~/gramps/trunk/example/gramps/data.gramps .
mv data.gramps data.gz
gunzip data.gz
mv data data.xml
wget http://gramps.1791082.n4.nabble.com/file/n3832887/GrampsCompare.py
python GrampsCompare.py data.xml I30 data.xml I30
python GrampsCompare.py data.xml I30 data.xml I31

For an important function like this, the timing is not so critical,
and I'm sure that we can make this run *much* faster once we use the
actual databases.

I've just begun to look at the code. The next thing that would be
useful in the coding is to create some functions that enclose the
functionality (for example:

checkNodes = get_check_nodes(...)

Now, I must dive in to understand the big picture...

Thanks!

-Doug

>
> --
> View this message in context: http://gramps.1791082.n4.nabble.com/Database-compare-and-merge-tp3832887p3832887.html
> Sent from the GRAMPS - Dev mailing list archive at Nabble.com.
>
> ------------------------------------------------------------------------------
> All the data continuously generated in your IT infrastructure contains a
> definitive record of customers, application performance, security
> threats, fraudulent activity and more. Splunk takes this data and makes
> sense of it. Business sense. IT sense. Common sense.
> http://p.sf.net/sfu/splunk-d2dcopy1
> _______________________________________________
> Gramps-devel mailing list
> [hidden email]
> https://lists.sourceforge.net/lists/listinfo/gramps-devel
>

------------------------------------------------------------------------------
All the data continuously generated in your IT infrastructure contains a
definitive record of customers, application performance, security
threats, fraudulent activity and more. Splunk takes this data and makes
sense of it. Business sense. IT sense. Common sense.
http://p.sf.net/sfu/splunk-d2dcopy1
_______________________________________________
Gramps-devel mailing list
[hidden email]
https://lists.sourceforge.net/lists/listinfo/gramps-devel
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

Re: Database compare and merge

jerome
In reply to this post by derHeinzi
Hello Heinz,


Funny, I am also working on an experimental way to parse a Gramps XML
file. :)

When you say that you need "several hours to
complete a run on database with about 3000 people", I think you should
really try 'lxml' !

Not tested, but at glance what you are trying to do might be done with
few lines with 'etree' from 'lxml': to iter/go to the greatchild level,
then to call something like "handle.items()".

I added comments on this experimental addon for parsing XML via 'lxml':

http://gramps-addons.svn.sourceforge.net/viewvc/gramps-addons/branches/gramps33/contrib/lxml

There is no package release because it is not finished/polished and do
not work on all OS. I try to make some comments, the improvements should
be simplier. Feel free to have a look, improve, modify, reuse it ! ;)


Regards,
Jérôme


derHeinzi a écrit :

> Hello Developers,
> this is quite a long message to a difficult matter. So bear with me.
>
> Please find attached a python script for comparing data in 2 Gramps xml
> files.
> http://gramps.1791082.n4.nabble.com/file/n3832887/GrampsCompare.py
> GrampsCompare.py
> The comparison is done in both databases starting with the "same" person,
> which you have to specify.
> For test:
> - Create gramps xml-files by unzipping two gramps archives.
> - Find IDs of the same "key"-person in both files.
> - Start script with parameters firstFile, firstID, secondFile, secondID
> - Output is written to screen. You might want to redirect to file.
> It is not (yet) a tool to compare entries in 2 different databases but you
> can already find the changes that have been done to a database you shared
> with some other person or to a backup you did some time ago.
>
> I'm currently working with Gramps 3.2.5-1 on WinXP and used xml-files from
> this version for program development and test. But as long as the attributes
> "id" "handle" and "hlink" are in the xml it should work for other versions
> as well.
>
> My intention was to find a possible way to a database compare and merge in
> Gramps. Following the devs and users mailing lists for quite a while now
> this matter came up from time to time, but I found no concept of solution
> discussed and no hint that someone is working on this right now.
> GrampsConnect was mentioned in some posts, but I did not check if there is a
> concept or solution there yet.
>
> After hacking a quick database compare script which took several hours to
> complete a run on my database with about 3000 people, I tweaked the script
> to now finish in less than a minute. This should be an amount of time that a
> user could accept for a complicated function to finish?
> Since I'm fairly new to Python and using this script as a way to learn the
> language there might be even better ways to do things. But hey, I'm proud of
> what I achieved in these few evenings! :-)
> I added tons of comments to the code to make you guys understand what the
> script is doing! So have fun with it. I don't claim any (copy)rights.
>
> Now, what could a database compare and merge look like in the gui and what
> is still do be done.
>
> First to the GUI of a compare and merge. If you look at the compare and
> merge window for a person in Gramps you see the person and related info side
> by side.
> This could be changed to a display as shown in attached cmpwin.png.
> http://gramps.1791082.n4.nabble.com/file/n3832887/cmpwin.png cmpwin.png
> The changes are:
> - For every subnode type (tag) in the database there is a "section" in the
> window. (For person eg. gender, names, ...
> - Only subnodes without handles are displayed for comparison.
> - All subnodes referring to other nodes with handles are shown in two lists
> (If a compare by script was performed beforehand the list entrys might show
> different colors for identical, changed and missing references.) The first
> list contains the items that "match" items in the other database, the second
> one shows the items that do not match or could match more than one items in
> the other database.
> - There is a means to "link" nodes from both databases. (The button with the
> "=" between the lower lists. The broken or unbroken chain symbol on a button
> would be more appropriate.) If you see the 2 marriage entries in this
> example on the right side, you have to decide which of these is already in
> the left database. So you select first the left and then one of the right
> marriage events to see the data in the quickview. If you find them to refer
> to the same marriage, you select both and press the "=" button to link these
> 2 entries. They will be moved to the lists above (matched information).
> - If you doubleclick (there could be a button for this) an entry in the
> matched information list, the content of the window is replaced by the
> content of the selected node (e.g. the family from the childof reference).
> There could be a "back" or even a "history" button for navigation like in
> browsers.
> - With the "+" button you add information from the 2nd database, with "<"
> you replace information.
> - The window looks and works the same for all comparisons, no matter if
> events, persons, families, ... are compared.
>
> What do you think of this GUI concept (not design!, thats far from nice)? Do
> you think this could be a way to handle data from 2 databases?
>
> Now to the question what still has to be done.
> - The standalone script has first to be improved and in the end to be
> integrated into Gramps.
> - The comparison of the data nodes currently does not have a "closer look"
> at the data. The data itself has to be taken into account. Eg. currently
> only a check attributesDB1 == attributesDB2 is done. This should ignore the
> 'change' attribute.
> - In the end this function must not rely on same IDs or handles but has to
> check the data itself. This might get a bit complicated, but I think with
> the approach that you only have to check the data referred to by already
> matched nodes it can be solved and handled. Even for comparison with a
> Gramps xml from an imported GEDCOM.
> - It might be useful to generate an output file with the found differences
> for further evaluation in Gramps or other software?
>
> My excuses for this long post. I hope it was worth reading and clear enough
> to make you understand what I try to say. My mother tongue is not English
> which makes it a little difficult to explain complicated things.
>
> Kind regards and have fun
> Heinz
>
>
> --
> View this message in context: http://gramps.1791082.n4.nabble.com/Database-compare-and-merge-tp3832887p3832887.html
> Sent from the GRAMPS - Dev mailing list archive at Nabble.com.
>
> ------------------------------------------------------------------------------
> All the data continuously generated in your IT infrastructure contains a
> definitive record of customers, application performance, security
> threats, fraudulent activity and more. Splunk takes this data and makes
> sense of it. Business sense. IT sense. Common sense.
> http://p.sf.net/sfu/splunk-d2dcopy1
> _______________________________________________
> Gramps-devel mailing list
> [hidden email]
> https://lists.sourceforge.net/lists/listinfo/gramps-devel
>


------------------------------------------------------------------------------
All of the data generated in your IT infrastructure is seriously valuable.
Why? It contains a definitive record of application performance, security
threats, fraudulent activity, and more. Splunk takes this data and makes
sense of it. IT sense. And common sense.
http://p.sf.net/sfu/splunk-d2dcopy2
_______________________________________________
Gramps-devel mailing list
[hidden email]
https://lists.sourceforge.net/lists/listinfo/gramps-devel
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

Re: Database compare and merge

jerome
Heinz,


> finish in less than a minute. This should be an
> amount of time that a user could accept for a complicated function to
> finish?

As you used ElementTree (python) and as methods are close to ElementTree from 'lxml', I have seen somewhere that findall() should be used with caution, maybe you can win some seconds with something else than findall()?

As said on my previous answer I was quickly able to generate something (less advanced but very fast parsing) with 'lxml' and for DTD 1.4.0 under posix/linux....

See some comparison between HTML parsers
http://blog.ianbicking.org/2008/03/30/python-html-parser-performance/
I have the same 'feeling' with XML files. Bonus, 'lxml' provides XPath and XSLT support ...

Note, I also have some scripts (diff, xsl), used every time I need to migrate my database ...Anyway, to have a standalone tool might be a good idea for backup and data comparison.


Thank you.
Jérôme






--- En date de : Dim 25.9.11, Jérôme <[hidden email]> a écrit :

> De: Jérôme <[hidden email]>
> Objet: Re: [Gramps-devel] Database compare and merge
> À: "derHeinzi" <[hidden email]>
> Cc: [hidden email]
> Date: Dimanche 25 septembre 2011, 17h30
> Hello Heinz,
>
>
> Funny, I am also working on an experimental way to parse a
> Gramps XML
> file. :)
>
> When you say that you need "several hours to
> complete a run on database with about 3000 people", I think
> you should
> really try 'lxml' !
>
> Not tested, but at glance what you are trying to do might
> be done with
> few lines with 'etree' from 'lxml': to iter/go to the
> greatchild level,
> then to call something like "handle.items()".
>
> I added comments on this experimental addon for parsing XML
> via 'lxml':
>
> http://gramps-addons.svn.sourceforge.net/viewvc/gramps-addons/branches/gramps33/contrib/lxml
>
> There is no package release because it is not
> finished/polished and do
> not work on all OS. I try to make some comments, the
> improvements should
> be simplier. Feel free to have a look, improve, modify,
> reuse it ! ;)
>
>
> Regards,
> Jérôme
>
>
> derHeinzi a écrit :
> > Hello Developers,
> > this is quite a long message to a difficult matter. So
> bear with me.
> >
> > Please find attached a python script for comparing
> data in 2 Gramps xml
> > files.
> > http://gramps.1791082.n4.nabble.com/file/n3832887/GrampsCompare.py
> > GrampsCompare.py
> > The comparison is done in both databases starting with
> the "same" person,
> > which you have to specify.
> > For test:
> > - Create gramps xml-files by unzipping two gramps
> archives.
> > - Find IDs of the same "key"-person in both files.
> > - Start script with parameters firstFile, firstID,
> secondFile, secondID
> > - Output is written to screen. You might want to
> redirect to file.
> > It is not (yet) a tool to compare entries in 2
> different databases but you
> > can already find the changes that have been done to a
> database you shared
> > with some other person or to a backup you did some
> time ago.
> >
> > I'm currently working with Gramps 3.2.5-1 on WinXP and
> used xml-files from
> > this version for program development and test. But as
> long as the attributes
> > "id" "handle" and "hlink" are in the xml it should
> work for other versions
> > as well.
> >
> > My intention was to find a possible way to a database
> compare and merge in
> > Gramps. Following the devs and users mailing lists for
> quite a while now
> > this matter came up from time to time, but I found no
> concept of solution
> > discussed and no hint that someone is working on this
> right now.
> > GrampsConnect was mentioned in some posts, but I did
> not check if there is a
> > concept or solution there yet.
> >
> > After hacking a quick database compare script which
> took several hours to
> > complete a run on my database with about 3000 people,
> I tweaked the script
> > to now finish in less than a minute. This should be an
> amount of time that a
> > user could accept for a complicated function to
> finish?
> > Since I'm fairly new to Python and using this script
> as a way to learn the
> > language there might be even better ways to do things.
> But hey, I'm proud of
> > what I achieved in these few evenings! :-)
> > I added tons of comments to the code to make you guys
> understand what the
> > script is doing! So have fun with it. I don't claim
> any (copy)rights.
> >
> > Now, what could a database compare and merge look like
> in the gui and what
> > is still do be done.
> >
> > First to the GUI of a compare and merge. If you look
> at the compare and
> > merge window for a person in Gramps you see the person
> and related info side
> > by side.
> > This could be changed to a display as shown in
> attached cmpwin.png.
> > http://gramps.1791082.n4.nabble.com/file/n3832887/cmpwin.png
> cmpwin.png
> > The changes are:
> > - For every subnode type (tag) in the database there
> is a "section" in the
> > window. (For person eg. gender, names, ...
> > - Only subnodes without handles are displayed for
> comparison.
> > - All subnodes referring to other nodes with handles
> are shown in two lists
> > (If a compare by script was performed beforehand the
> list entrys might show
> > different colors for identical, changed and missing
> references.) The first
> > list contains the items that "match" items in the
> other database, the second
> > one shows the items that do not match or could match
> more than one items in
> > the other database.
> > - There is a means to "link" nodes from both
> databases. (The button with the
> > "=" between the lower lists. The broken or unbroken
> chain symbol on a button
> > would be more appropriate.) If you see the 2 marriage
> entries in this
> > example on the right side, you have to decide which of
> these is already in
> > the left database. So you select first the left and
> then one of the right
> > marriage events to see the data in the quickview. If
> you find them to refer
> > to the same marriage, you select both and press the
> "=" button to link these
> > 2 entries. They will be moved to the lists above
> (matched information).
> > - If you doubleclick (there could be a button for
> this) an entry in the
> > matched information list, the content of the window is
> replaced by the
> > content of the selected node (e.g. the family from the
> childof reference).
> > There could be a "back" or even a "history" button for
> navigation like in
> > browsers.
> > - With the "+" button you add information from the 2nd
> database, with "<"
> > you replace information.
> > - The window looks and works the same for all
> comparisons, no matter if
> > events, persons, families, ... are compared.
> >
> > What do you think of this GUI concept (not design!,
> thats far from nice)? Do
> > you think this could be a way to handle data from 2
> databases?
> >
> > Now to the question what still has to be done.
> > - The standalone script has first to be improved and
> in the end to be
> > integrated into Gramps.
> > - The comparison of the data nodes currently does not
> have a "closer look"
> > at the data. The data itself has to be taken into
> account. Eg. currently
> > only a check attributesDB1 == attributesDB2 is done.
> This should ignore the
> > 'change' attribute.
> > - In the end this function must not rely on same IDs
> or handles but has to
> > check the data itself. This might get a bit
> complicated, but I think with
> > the approach that you only have to check the data
> referred to by already
> > matched nodes it can be solved and handled. Even for
> comparison with a
> > Gramps xml from an imported GEDCOM.
> > - It might be useful to generate an output file with
> the found differences
> > for further evaluation in Gramps or other software?
> >
> > My excuses for this long post. I hope it was worth
> reading and clear enough
> > to make you understand what I try to say. My mother
> tongue is not English
> > which makes it a little difficult to explain
> complicated things.
> >
> > Kind regards and have fun
> > Heinz
> >
> >
> > --
> > View this message in context: http://gramps.1791082.n4.nabble.com/Database-compare-and-merge-tp3832887p3832887.html
> > Sent from the GRAMPS - Dev mailing list archive at
> Nabble.com.
> >
> >
> ------------------------------------------------------------------------------
> > All the data continuously generated in your IT
> infrastructure contains a
> > definitive record of customers, application
> performance, security
> > threats, fraudulent activity and more. Splunk takes
> this data and makes
> > sense of it. Business sense. IT sense. Common sense.
> > http://p.sf.net/sfu/splunk-d2dcopy1
> > _______________________________________________
> > Gramps-devel mailing list
> > [hidden email]
> > https://lists.sourceforge.net/lists/listinfo/gramps-devel
> >
>
>
> ------------------------------------------------------------------------------
> All of the data generated in your IT infrastructure is
> seriously valuable.
> Why? It contains a definitive record of application
> performance, security
> threats, fraudulent activity, and more. Splunk takes this
> data and makes
> sense of it. IT sense. And common sense.
> http://p.sf.net/sfu/splunk-d2dcopy2
> _______________________________________________
> Gramps-devel mailing list
> [hidden email]
> https://lists.sourceforge.net/lists/listinfo/gramps-devel
>

------------------------------------------------------------------------------
All of the data generated in your IT infrastructure is seriously valuable.
Why? It contains a definitive record of application performance, security
threats, fraudulent activity, and more. Splunk takes this data and makes
sense of it. IT sense. And common sense.
http://p.sf.net/sfu/splunk-d2dcopy2
_______________________________________________
Gramps-devel mailing list
[hidden email]
https://lists.sourceforge.net/lists/listinfo/gramps-devel
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

Re: Database compare and merge

Ken B.
In reply to this post by derHeinzi
Hello All,

    Thank you Heinz, great work here.
    This looks like a great start towards something that is desperately needed in Gramps.
    For those of you lucky enough to be able to write the code I do have a problem with using the Id's as a method of matching.
    I share data with a relative I've also got to use Gramps. Recently I sent a database and they had added people, events and notes and returned a Gramps file to me. After imported their database into a backup of my databse a lot of people, events and notes had become duplicate enteries. On doing some investigating I found they had run the [Tools > Family Tree Processing > Reorder Gramps ID's ] tool. This had allocated new ID's to people, events, notes etc.
    May I suggest that the comparison be between a persons name, date of birth, date of death, events, notes, sources etc. This would also be needed when comparing events, sources and notes etc.
    I do understand that this would be a long and probably slow process, so maybe this type of comparison could be an option.
    I would give a much better merge, and save the end user a lot of time fixing the database. The way I see such a tool is to do just that, saving work and time.

Kind Regards,
Ken Benseman.
New Zealand.


On 22/09/11 21:47, derHeinzi wrote:
Hello Developers,
this is quite a long message to a difficult matter. So bear with me.

Please find attached a python script for comparing data in 2 Gramps xml
files.
http://gramps.1791082.n4.nabble.com/file/n3832887/GrampsCompare.py
GrampsCompare.py 
The comparison is done in both databases starting with the "same" person,
which you have to specify.
For test:
- Create gramps xml-files by unzipping two gramps archives.
- Find IDs of the same "key"-person in both files.
- Start script with parameters firstFile, firstID, secondFile, secondID
- Output is written to screen. You might want to redirect to file.
It is not (yet) a tool to compare entries in 2 different databases but you
can already find the changes that have been done to a database you shared
with some other person or to a backup you did some time ago.

I'm currently working with Gramps 3.2.5-1 on WinXP and used xml-files from
this version for program development and test. But as long as the attributes
"id" "handle" and "hlink" are in the xml it should work for other versions
as well.

My intention was to find a possible way to a database compare and merge in
Gramps. Following the devs and users mailing lists for quite a while now
this matter came up from time to time, but I found no concept of solution
discussed and no hint that someone is working on this right now.
GrampsConnect was mentioned in some posts, but I did not check if there is a
concept or solution there yet.

After hacking a quick database compare script which took several hours to
complete a run on my database with about 3000 people, I tweaked the script
to now finish in less than a minute. This should be an amount of time that a
user could accept for a complicated function to finish?
Since I'm fairly new to Python and using this script as a way to learn the
language there might be even better ways to do things. But hey, I'm proud of
what I achieved in these few evenings! :-)
I added tons of comments to the code to make you guys understand what the
script is doing! So have fun with it. I don't claim any (copy)rights.

Now, what could a database compare and merge look like in the gui and what
is still do be done.

First to the GUI of a compare and merge. If you look at the compare and
merge window for a person in Gramps you see the person and related info side
by side.
This could be changed to a display as shown in attached cmpwin.png. 
http://gramps.1791082.n4.nabble.com/file/n3832887/cmpwin.png cmpwin.png 
The changes are:
- For every subnode type (tag) in the database there is a "section" in the
window. (For person eg. gender, names, ...
- Only subnodes without handles are displayed for comparison.
- All subnodes referring to other nodes with handles are shown in two lists
(If a compare by script was performed beforehand the list entrys might show
different colors for identical, changed and missing references.) The first
list contains the items that "match" items in the other database, the second
one shows the items that do not match or could match more than one items in
the other database.
- There is a means to "link" nodes from both databases. (The button with the
"=" between the lower lists. The broken or unbroken chain symbol on a button
would be more appropriate.) If you see the 2 marriage entries in this
example on the right side, you have to decide which of these is already in
the left database. So you select first the left and then one of the right
marriage events to see the data in the quickview. If you find them to refer
to the same marriage, you select both and press the "=" button to link these
2 entries. They will be moved to the lists above (matched information).
- If you doubleclick (there could be a button for this) an entry in the
matched information list, the content of the window is replaced by the
content of the selected node (e.g. the family from the childof reference).
There could be a "back" or even a "history" button for navigation like in
browsers.
- With the "+" button you add information from the 2nd database, with "<"
you replace information.
- The window looks and works the same for all comparisons, no matter if
events, persons, families, ... are compared.

What do you think of this GUI concept (not design!, thats far from nice)? Do
you think this could be a way to handle data from 2 databases?

Now to the question what still has to be done.
- The standalone script has first to be improved and in the end to be
integrated into Gramps.
- The comparison of the data nodes currently does not have a "closer look"
at the data. The data itself has to be taken into account. Eg. currently
only a check attributesDB1 == attributesDB2 is done. This should ignore the
'change' attribute.
- In the end this function must not rely on same IDs or handles but has to
check the data itself. This might get a bit complicated, but I think with
the approach that you only have to check the data referred to by already
matched nodes it can be solved and handled. Even for comparison with a
Gramps xml from an imported GEDCOM.
- It might be useful to generate an output file with the found differences
for further evaluation in Gramps or other software?

My excuses for this long post. I hope it was worth reading and clear enough
to make you understand what I try to say. My mother tongue is not English
which makes it a little difficult to explain complicated things. 

Kind regards and have fun
Heinz


--
View this message in context: http://gramps.1791082.n4.nabble.com/Database-compare-and-merge-tp3832887p3832887.html
Sent from the GRAMPS - Dev mailing list archive at Nabble.com.

------------------------------------------------------------------------------
All the data continuously generated in your IT infrastructure contains a
definitive record of customers, application performance, security
threats, fraudulent activity and more. Splunk takes this data and makes
sense of it. Business sense. IT sense. Common sense.
http://p.sf.net/sfu/splunk-d2dcopy1
_______________________________________________
Gramps-devel mailing list
[hidden email]
https://lists.sourceforge.net/lists/listinfo/gramps-devel


------------------------------------------------------------------------------
All of the data generated in your IT infrastructure is seriously valuable.
Why? It contains a definitive record of application performance, security
threats, fraudulent activity, and more. Splunk takes this data and makes
sense of it. IT sense. And common sense.
http://p.sf.net/sfu/splunk-d2dcopy2
_______________________________________________
Gramps-devel mailing list
[hidden email]
https://lists.sourceforge.net/lists/listinfo/gramps-devel
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

Re: Database compare and merge

DS Blank
On Sun, Sep 25, 2011 at 4:27 PM, Ken <[hidden email]> wrote:

> Hello All,
>
>     Thank you Heinz, great work here.
>     This looks like a great start towards something that is desperately
> needed in Gramps.
>     For those of you lucky enough to be able to write the code I do have a
> problem with using the Id's as a method of matching.
>     I share data with a relative I've also got to use Gramps. Recently I
> sent a database and they had added people, events and notes and returned a
> Gramps file to me. After imported their database into a backup of my databse
> a lot of people, events and notes had become duplicate enteries. On doing
> some investigating I found they had run the [Tools > Family Tree Processing
>> Reorder Gramps ID's ] tool. This had allocated new ID's to people, events,
> notes etc.
>     May I suggest that the comparison be between a persons name, date of
> birth, date of death, events, notes, sources etc. This would also be needed
> when comparing events, sources and notes etc.
>     I do understand that this would be a long and probably slow process, so
> maybe this type of comparison could be an option.
>     I would give a much better merge, and save the end user a lot of time
> fixing the database. The way I see such a tool is to do just that, saving
> work and time.

Agreed that sometimes this might take some time, but that is ok.

In your example, the issue could almost be avoided completely with a
UID as outlined in [1]. Thus, even if a user changed some important
information (like name or ID), the UID would still be retained and
could be used to match. A nice UI could allow one to ignore those
minor/irrelevant differences (or include them), as found by a tool
like Heinz's.

I think this is the time to keep a set of UIDs for each person.

-Doug

[1] http://www.gramps-project.org/wiki/index.php?title=GEPS_009:_Import_Export_Merge

> Kind Regards,
> Ken Benseman.
> New Zealand.
>
> On 22/09/11 21:47, derHeinzi wrote:
>
> Hello Developers,
> this is quite a long message to a difficult matter. So bear with me.
>
> Please find attached a python script for comparing data in 2 Gramps xml
> files.
> http://gramps.1791082.n4.nabble.com/file/n3832887/GrampsCompare.py
> GrampsCompare.py
> The comparison is done in both databases starting with the "same" person,
> which you have to specify.
> For test:
> - Create gramps xml-files by unzipping two gramps archives.
> - Find IDs of the same "key"-person in both files.
> - Start script with parameters firstFile, firstID, secondFile, secondID
> - Output is written to screen. You might want to redirect to file.
> It is not (yet) a tool to compare entries in 2 different databases but you
> can already find the changes that have been done to a database you shared
> with some other person or to a backup you did some time ago.
>
> I'm currently working with Gramps 3.2.5-1 on WinXP and used xml-files from
> this version for program development and test. But as long as the attributes
> "id" "handle" and "hlink" are in the xml it should work for other versions
> as well.
>
> My intention was to find a possible way to a database compare and merge in
> Gramps. Following the devs and users mailing lists for quite a while now
> this matter came up from time to time, but I found no concept of solution
> discussed and no hint that someone is working on this right now.
> GrampsConnect was mentioned in some posts, but I did not check if there is a
> concept or solution there yet.
>
> After hacking a quick database compare script which took several hours to
> complete a run on my database with about 3000 people, I tweaked the script
> to now finish in less than a minute. This should be an amount of time that a
> user could accept for a complicated function to finish?
> Since I'm fairly new to Python and using this script as a way to learn the
> language there might be even better ways to do things. But hey, I'm proud of
> what I achieved in these few evenings! :-)
> I added tons of comments to the code to make you guys understand what the
> script is doing! So have fun with it. I don't claim any (copy)rights.
>
> Now, what could a database compare and merge look like in the gui and what
> is still do be done.
>
> First to the GUI of a compare and merge. If you look at the compare and
> merge window for a person in Gramps you see the person and related info side
> by side.
> This could be changed to a display as shown in attached cmpwin.png.
> http://gramps.1791082.n4.nabble.com/file/n3832887/cmpwin.png cmpwin.png
> The changes are:
> - For every subnode type (tag) in the database there is a "section" in the
> window. (For person eg. gender, names, ...
> - Only subnodes without handles are displayed for comparison.
> - All subnodes referring to other nodes with handles are shown in two lists
> (If a compare by script was performed beforehand the list entrys might show
> different colors for identical, changed and missing references.) The first
> list contains the items that "match" items in the other database, the second
> one shows the items that do not match or could match more than one items in
> the other database.
> - There is a means to "link" nodes from both databases. (The button with the
> "=" between the lower lists. The broken or unbroken chain symbol on a button
> would be more appropriate.) If you see the 2 marriage entries in this
> example on the right side, you have to decide which of these is already in
> the left database. So you select first the left and then one of the right
> marriage events to see the data in the quickview. If you find them to refer
> to the same marriage, you select both and press the "=" button to link these
> 2 entries. They will be moved to the lists above (matched information).
> - If you doubleclick (there could be a button for this) an entry in the
> matched information list, the content of the window is replaced by the
> content of the selected node (e.g. the family from the childof reference).
> There could be a "back" or even a "history" button for navigation like in
> browsers.
> - With the "+" button you add information from the 2nd database, with "<"
> you replace information.
> - The window looks and works the same for all comparisons, no matter if
> events, persons, families, ... are compared.
>
> What do you think of this GUI concept (not design!, thats far from nice)? Do
> you think this could be a way to handle data from 2 databases?
>
> Now to the question what still has to be done.
> - The standalone script has first to be improved and in the end to be
> integrated into Gramps.
> - The comparison of the data nodes currently does not have a "closer look"
> at the data. The data itself has to be taken into account. Eg. currently
> only a check attributesDB1 == attributesDB2 is done. This should ignore the
> 'change' attribute.
> - In the end this function must not rely on same IDs or handles but has to
> check the data itself. This might get a bit complicated, but I think with
> the approach that you only have to check the data referred to by already
> matched nodes it can be solved and handled. Even for comparison with a
> Gramps xml from an imported GEDCOM.
> - It might be useful to generate an output file with the found differences
> for further evaluation in Gramps or other software?
>
> My excuses for this long post. I hope it was worth reading and clear enough
> to make you understand what I try to say. My mother tongue is not English
> which makes it a little difficult to explain complicated things.
>
> Kind regards and have fun
> Heinz
>
>
> --
> View this message in context:
> http://gramps.1791082.n4.nabble.com/Database-compare-and-merge-tp3832887p3832887.html
> Sent from the GRAMPS - Dev mailing list archive at Nabble.com.
>
> ------------------------------------------------------------------------------
> All the data continuously generated in your IT infrastructure contains a
> definitive record of customers, application performance, security
> threats, fraudulent activity and more. Splunk takes this data and makes
> sense of it. Business sense. IT sense. Common sense.
> http://p.sf.net/sfu/splunk-d2dcopy1
> _______________________________________________
> Gramps-devel mailing list
> [hidden email]
> https://lists.sourceforge.net/lists/listinfo/gramps-devel
>
>
> ------------------------------------------------------------------------------
> All of the data generated in your IT infrastructure is seriously valuable.
> Why? It contains a definitive record of application performance, security
> threats, fraudulent activity, and more. Splunk takes this data and makes
> sense of it. IT sense. And common sense.
> http://p.sf.net/sfu/splunk-d2dcopy2
> _______________________________________________
> Gramps-devel mailing list
> [hidden email]
> https://lists.sourceforge.net/lists/listinfo/gramps-devel
>
>

------------------------------------------------------------------------------
All of the data generated in your IT infrastructure is seriously valuable.
Why? It contains a definitive record of application performance, security
threats, fraudulent activity, and more. Splunk takes this data and makes
sense of it. IT sense. And common sense.
http://p.sf.net/sfu/splunk-d2dcopy2
_______________________________________________
Gramps-devel mailing list
[hidden email]
https://lists.sourceforge.net/lists/listinfo/gramps-devel
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

Re: Database compare and merge

Ken B.


On 26/09/11 10:21, Doug Blank wrote:
On Sun, Sep 25, 2011 at 4:27 PM, Ken [hidden email] wrote:
Hello All,

    Thank you Heinz, great work here.
    This looks like a great start towards something that is desperately
needed in Gramps.
    For those of you lucky enough to be able to write the code I do have a
problem with using the Id's as a method of matching.
    I share data with a relative I've also got to use Gramps. Recently I
sent a database and they had added people, events and notes and returned a
Gramps file to me. After imported their database into a backup of my databse
a lot of people, events and notes had become duplicate enteries. On doing
some investigating I found they had run the [Tools > Family Tree Processing
Reorder Gramps ID's ] tool. This had allocated new ID's to people, events,
notes etc.
    May I suggest that the comparison be between a persons name, date of
birth, date of death, events, notes, sources etc. This would also be needed
when comparing events, sources and notes etc.
    I do understand that this would be a long and probably slow process, so
maybe this type of comparison could be an option.
    I would give a much better merge, and save the end user a lot of time
fixing the database. The way I see such a tool is to do just that, saving
work and time.
Agreed that sometimes this might take some time, but that is ok.

In your example, the issue could almost be avoided completely with a
UID as outlined in [1]. Thus, even if a user changed some important
information (like name or ID), the UID would still be retained and
could be used to match. A nice UI could allow one to ignore those
minor/irrelevant differences (or include them), as found by a tool
like Heinz's.

I think this is the time to keep a set of UIDs for each person.

-Doug

[1] http://www.gramps-project.org/wiki/index.php?title=GEPS_009:_Import_Export_Merge
Thanks Doug.
I've read your link. That would be good if using UID did avoid the problem.
I have tried your method give earlier in this thread, using a copy of the original file I sent and the file returned to me.
I get the following error:
Mark all entries related to changed person
Traceback (most recent call last):
  File "GrampsCompare.py", line 349, in <module>
    getRelatedNodes(db2, comparePerson, ns2l, compMainNodes, compRelNodes)
  File "GrampsCompare.py", line 93, in getRelatedNodes
    for subnode in node:    # Recursive call for all subnodes
TypeError: 'NoneType' object is not iterable

Ken.

Kind Regards,
Ken Benseman.
New Zealand.

On 22/09/11 21:47, derHeinzi wrote:

Hello Developers,
this is quite a long message to a difficult matter. So bear with me.

Please find attached a python script for comparing data in 2 Gramps xml
files.
http://gramps.1791082.n4.nabble.com/file/n3832887/GrampsCompare.py
GrampsCompare.py
The comparison is done in both databases starting with the "same" person,
which you have to specify.
For test:
- Create gramps xml-files by unzipping two gramps archives.
- Find IDs of the same "key"-person in both files.
- Start script with parameters firstFile, firstID, secondFile, secondID
- Output is written to screen. You might want to redirect to file.
It is not (yet) a tool to compare entries in 2 different databases but you
can already find the changes that have been done to a database you shared
with some other person or to a backup you did some time ago.

I'm currently working with Gramps 3.2.5-1 on WinXP and used xml-files from
this version for program development and test. But as long as the attributes
"id" "handle" and "hlink" are in the xml it should work for other versions
as well.

My intention was to find a possible way to a database compare and merge in
Gramps. Following the devs and users mailing lists for quite a while now
this matter came up from time to time, but I found no concept of solution
discussed and no hint that someone is working on this right now.
GrampsConnect was mentioned in some posts, but I did not check if there is a
concept or solution there yet.

After hacking a quick database compare script which took several hours to
complete a run on my database with about 3000 people, I tweaked the script
to now finish in less than a minute. This should be an amount of time that a
user could accept for a complicated function to finish?
Since I'm fairly new to Python and using this script as a way to learn the
language there might be even better ways to do things. But hey, I'm proud of
what I achieved in these few evenings! :-)
I added tons of comments to the code to make you guys understand what the
script is doing! So have fun with it. I don't claim any (copy)rights.

Now, what could a database compare and merge look like in the gui and what
is still do be done.

First to the GUI of a compare and merge. If you look at the compare and
merge window for a person in Gramps you see the person and related info side
by side.
This could be changed to a display as shown in attached cmpwin.png.
http://gramps.1791082.n4.nabble.com/file/n3832887/cmpwin.png cmpwin.png
The changes are:
- For every subnode type (tag) in the database there is a "section" in the
window. (For person eg. gender, names, ...
- Only subnodes without handles are displayed for comparison.
- All subnodes referring to other nodes with handles are shown in two lists
(If a compare by script was performed beforehand the list entrys might show
different colors for identical, changed and missing references.) The first
list contains the items that "match" items in the other database, the second
one shows the items that do not match or could match more than one items in
the other database.
- There is a means to "link" nodes from both databases. (The button with the
"=" between the lower lists. The broken or unbroken chain symbol on a button
would be more appropriate.) If you see the 2 marriage entries in this
example on the right side, you have to decide which of these is already in
the left database. So you select first the left and then one of the right
marriage events to see the data in the quickview. If you find them to refer
to the same marriage, you select both and press the "=" button to link these
2 entries. They will be moved to the lists above (matched information).
- If you doubleclick (there could be a button for this) an entry in the
matched information list, the content of the window is replaced by the
content of the selected node (e.g. the family from the childof reference).
There could be a "back" or even a "history" button for navigation like in
browsers.
- With the "+" button you add information from the 2nd database, with "<"
you replace information.
- The window looks and works the same for all comparisons, no matter if
events, persons, families, ... are compared.

What do you think of this GUI concept (not design!, thats far from nice)? Do
you think this could be a way to handle data from 2 databases?

Now to the question what still has to be done.
- The standalone script has first to be improved and in the end to be
integrated into Gramps.
- The comparison of the data nodes currently does not have a "closer look"
at the data. The data itself has to be taken into account. Eg. currently
only a check attributesDB1 == attributesDB2 is done. This should ignore the
'change' attribute.
- In the end this function must not rely on same IDs or handles but has to
check the data itself. This might get a bit complicated, but I think with
the approach that you only have to check the data referred to by already
matched nodes it can be solved and handled. Even for comparison with a
Gramps xml from an imported GEDCOM.
- It might be useful to generate an output file with the found differences
for further evaluation in Gramps or other software?

My excuses for this long post. I hope it was worth reading and clear enough
to make you understand what I try to say. My mother tongue is not English
which makes it a little difficult to explain complicated things.

Kind regards and have fun
Heinz


--
View this message in context:
http://gramps.1791082.n4.nabble.com/Database-compare-and-merge-tp3832887p3832887.html
Sent from the GRAMPS - Dev mailing list archive at Nabble.com.

------------------------------------------------------------------------------
All the data continuously generated in your IT infrastructure contains a
definitive record of customers, application performance, security
threats, fraudulent activity and more. Splunk takes this data and makes
sense of it. Business sense. IT sense. Common sense.
http://p.sf.net/sfu/splunk-d2dcopy1
_______________________________________________
Gramps-devel mailing list
[hidden email]
https://lists.sourceforge.net/lists/listinfo/gramps-devel


------------------------------------------------------------------------------
All of the data generated in your IT infrastructure is seriously valuable.
Why? It contains a definitive record of application performance, security
threats, fraudulent activity, and more. Splunk takes this data and makes
sense of it. IT sense. And common sense.
http://p.sf.net/sfu/splunk-d2dcopy2
_______________________________________________
Gramps-devel mailing list
[hidden email]
https://lists.sourceforge.net/lists/listinfo/gramps-devel



    

------------------------------------------------------------------------------
All of the data generated in your IT infrastructure is seriously valuable.
Why? It contains a definitive record of application performance, security
threats, fraudulent activity, and more. Splunk takes this data and makes
sense of it. IT sense. And common sense.
http://p.sf.net/sfu/splunk-d2dcopy2
_______________________________________________
Gramps-devel mailing list
[hidden email]
https://lists.sourceforge.net/lists/listinfo/gramps-devel
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

Re: Database compare and merge

derHeinzi
In reply to this post by DS Blank

Hello Doug,

thank you for your encouragement. This is, of course, only a first step in the direction to the target. There might be things I overlooked in the concept and it might be more complicated to achieve the goal than I currently assume, but if this serves a an initial impetus towards a solution I would be happy.

During the weekend it occurred to me that it might be a first step toward integration into Gramps to create a "Difference report"?
>From within Gramps you select a person to run the report for, then another database and the "same" person in that database and the report would print the differences?

Kind regards,
Heinz

--- Doug Blank <[hidden email]> schrieb am Do, 22.9.2011:

> Von: Doug Blank <[hidden email]>
> Betreff: Re: [Gramps-devel] Database compare and merge
> An: "derHeinzi" <[hidden email]>
> CC: [hidden email]
> Datum: Donnerstag, 22. September, 2011 13:31 Uhr
> On Thu, Sep 22, 2011 at 5:47 AM,
> derHeinzi <[hidden email]>
> wrote:
> > Hello Developers,
> > this is quite a long message to a difficult matter. So
> bear with me.
> >
> > Please find attached a python script for comparing
> data in 2 Gramps xml
> > files.
> > http://gramps.1791082.n4.nabble.com/file/n3832887/GrampsCompare.py
> > GrampsCompare.py
> > The comparison is done in both databases starting with
> the "same" person,
> > which you have to specify.
> > For test:
> > - Create gramps xml-files by unzipping two gramps
> archives.
> > - Find IDs of the same "key"-person in both files.
> > - Start script with parameters firstFile, firstID,
> secondFile, secondID
> > - Output is written to screen. You might want to
> redirect to file.
> > It is not (yet) a tool to compare entries in 2
> different databases but you
> > can already find the changes that have been done to a
> database you shared
> > with some other person or to a backup you did some
> time ago.
> >
> > I'm currently working with Gramps 3.2.5-1 on WinXP and
> used xml-files from
> > this version for program development and test. But as
> long as the attributes
> > "id" "handle" and "hlink" are in the xml it should
> work for other versions
> > as well.
> >
> > My intention was to find a possible way to a database
> compare and merge in
> > Gramps. Following the devs and users mailing lists for
> quite a while now
> > this matter came up from time to time, but I found no
> concept of solution
> > discussed and no hint that someone is working on this
> right now.
> > GrampsConnect was mentioned in some posts, but I did
> not check if there is a
> > concept or solution there yet.
> >
> > After hacking a quick database compare script which
> took several hours to
> > complete a run on my database with about 3000 people,
> I tweaked the script
> > to now finish in less than a minute. This should be an
> amount of time that a
> > user could accept for a complicated function to
> finish?
> > Since I'm fairly new to Python and using this script
> as a way to learn the
> > language there might be even better ways to do things.
> But hey, I'm proud of
> > what I achieved in these few evenings! :-)
> > I added tons of comments to the code to make you guys
> understand what the
> > script is doing! So have fun with it. I don't claim
> any (copy)rights.
> >
> > Now, what could a database compare and merge look like
> in the gui and what
> > is still do be done.
> >
> > First to the GUI of a compare and merge. If you look
> at the compare and
> > merge window for a person in Gramps you see the person
> and related info side
> > by side.
> > This could be changed to a display as shown in
> attached cmpwin.png.
> > http://gramps.1791082.n4.nabble.com/file/n3832887/cmpwin.png
> cmpwin.png
> > The changes are:
> > - For every subnode type (tag) in the database there
> is a "section" in the
> > window. (For person eg. gender, names, ...
> > - Only subnodes without handles are displayed for
> comparison.
> > - All subnodes referring to other nodes with handles
> are shown in two lists
> > (If a compare by script was performed beforehand the
> list entrys might show
> > different colors for identical, changed and missing
> references.) The first
> > list contains the items that "match" items in the
> other database, the second
> > one shows the items that do not match or could match
> more than one items in
> > the other database.
> > - There is a means to "link" nodes from both
> databases. (The button with the
> > "=" between the lower lists. The broken or unbroken
> chain symbol on a button
> > would be more appropriate.) If you see the 2 marriage
> entries in this
> > example on the right side, you have to decide which of
> these is already in
> > the left database. So you select first the left and
> then one of the right
> > marriage events to see the data in the quickview. If
> you find them to refer
> > to the same marriage, you select both and press the
> "=" button to link these
> > 2 entries. They will be moved to the lists above
> (matched information).
> > - If you doubleclick (there could be a button for
> this) an entry in the
> > matched information list, the content of the window is
> replaced by the
> > content of the selected node (e.g. the family from the
> childof reference).
> > There could be a "back" or even a "history" button for
> navigation like in
> > browsers.
> > - With the "+" button you add information from the 2nd
> database, with "<"
> > you replace information.
> > - The window looks and works the same for all
> comparisons, no matter if
> > events, persons, families, ... are compared.
> >
> > What do you think of this GUI concept (not design!,
> thats far from nice)? Do
> > you think this could be a way to handle data from 2
> databases?
> >
> > Now to the question what still has to be done.
> > - The standalone script has first to be improved and
> in the end to be
> > integrated into Gramps.
> > - The comparison of the data nodes currently does not
> have a "closer look"
> > at the data. The data itself has to be taken into
> account. Eg. currently
> > only a check attributesDB1 == attributesDB2 is done.
> This should ignore the
> > 'change' attribute.
> > - In the end this function must not rely on same IDs
> or handles but has to
> > check the data itself. This might get a bit
> complicated, but I think with
> > the approach that you only have to check the data
> referred to by already
> > matched nodes it can be solved and handled. Even for
> comparison with a
> > Gramps xml from an imported GEDCOM.
> > - It might be useful to generate an output file with
> the found differences
> > for further evaluation in Gramps or other software?
> >
> > My excuses for this long post. I hope it was worth
> reading and clear enough
> > to make you understand what I try to say. My mother
> tongue is not English
> > which makes it a little difficult to explain
> complicated things.
> >
> > Kind regards and have fun
> > Heinz
>
> Heinz,
>
> Excellent! This is a great way to get the process going
> towards a real
> solution. Thank you for this investment of time, energy,
> and ideas.
>
> For the truly lazy (like me), here is a quick way to get
> started
> exploring Heinz's code:
>
> mkdir compare
> cd compare/
> cp ~/gramps/trunk/example/gramps/data.gramps .
> mv data.gramps data.gz
> gunzip data.gz
> mv data data.xml
> wget http://gramps.1791082.n4.nabble.com/file/n3832887/GrampsCompare.py
> python GrampsCompare.py data.xml I30 data.xml I30
> python GrampsCompare.py data.xml I30 data.xml I31
>
> For an important function like this, the timing is not so
> critical,
> and I'm sure that we can make this run *much* faster once
> we use the
> actual databases.
>
> I've just begun to look at the code. The next thing that
> would be
> useful in the coding is to create some functions that
> enclose the
> functionality (for example:
>
> checkNodes = get_check_nodes(...)
>
> Now, I must dive in to understand the big picture...
>
> Thanks!
>
> -Doug
>
> >
> > --
> > View this message in context: http://gramps.1791082.n4.nabble.com/Database-compare-and-merge-tp3832887p3832887.html
> > Sent from the GRAMPS - Dev mailing list archive at
> Nabble.com.
> >
> >
> ------------------------------------------------------------------------------
> > All the data continuously generated in your IT
> infrastructure contains a
> > definitive record of customers, application
> performance, security
> > threats, fraudulent activity and more. Splunk takes
> this data and makes
> > sense of it. Business sense. IT sense. Common sense.
> > http://p.sf.net/sfu/splunk-d2dcopy1
> > _______________________________________________
> > Gramps-devel mailing list
> > [hidden email]
> > https://lists.sourceforge.net/lists/listinfo/gramps-devel
> >
>

------------------------------------------------------------------------------
All the data continuously generated in your IT infrastructure contains a
definitive record of customers, application performance, security
threats, fraudulent activity and more. Splunk takes this data and makes
sense of it. Business sense. IT sense. Common sense.
http://p.sf.net/sfu/splunk-d2dcopy1
_______________________________________________
Gramps-devel mailing list
[hidden email]
https://lists.sourceforge.net/lists/listinfo/gramps-devel
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

Re: Database compare and merge

derHeinzi
In reply to this post by jerome

Hi Jerome,

with
> Funny, I am also working on an experimental way to parse a
> Gramps XML file. :)
you are referring to the display as XSLT? I saw that discussion on the list.

> When you say that you need "several hours to
> complete a run on database with about 3000 people", I think
> you should really try 'lxml' !

It seems you did not read the next sentence? The runtime had nothing to do with the xml access. I've got a database with about 3000 persons and there are about 75000 elements (nodes) to be handled. So the looping technique plays a great role here and as a beginner in Python I had to find a solution for this. And I did. :-)

> Not tested, but at glance what you are trying to do might
> be done with few lines with 'etree' from 'lxml': to iter/go
> to the greatchild level, then to call something like
> "handle.items()".

No. That might work to compare two snapshots of the same database, but my (final) taget is a compare tool that also works on completely different database contents.

> I added comments on this experimental addon for parsing XML
> via 'lxml':

I'll have a look at 'lxml' when time permits. Thank you for the link!

> There is no package release because it is not
> finished/polished and do not work on all OS. I try to make
> some comments, the improvements should be simplier. Feel
> free to have a look, improve, modify, reuse it ! ;)

Thank you, Jerome.

Kind regards,
Heinz


------------------------------------------------------------------------------
All the data continuously generated in your IT infrastructure contains a
definitive record of customers, application performance, security
threats, fraudulent activity and more. Splunk takes this data and makes
sense of it. Business sense. IT sense. Common sense.
http://p.sf.net/sfu/splunk-d2dcopy1
_______________________________________________
Gramps-devel mailing list
[hidden email]
https://lists.sourceforge.net/lists/listinfo/gramps-devel
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

Re: Database compare and merge

derHeinzi
In reply to this post by Ken B.

Hi Ken,

[I edited the linebreaks in the quotes of your mail]
> This looks like a great start towards something that is desperately needed in Gramps.

That this is 'desperately needed' in Gramps was my impression too. Thats why I try to get it going somehow. :-)

> For those of you lucky enough to be able to write the code I do have a problem with using the Id's as a method of matching.
> I share data with a relative I've also got to use Gramps.
> Recently I sent a database and they had added people, events and notes and returned a Gramps file to me. After imported their database into a backup of my databse a lot of people, events and notes had become duplicate enteries. On doing some investigating I found they had run the [Tools > Family Tree Processing > Reorder Gramps ID's ] tool. This had allocated new ID's to people, events, notes etc.
>
> May I suggest that the comparison be between a persons name, date of birth, date of death, events, notes, sources etc. This would also be needed when comparing events, sources and notes etc.
       
Ken, this is exactly what I'm trying to achieve. That's why on starting the script you have to give the ID of the person in the first and the second file. From there the compare should (in the final version) disregard all database specific information and completely rely on the data itself. But it's a long way to get there.

Kind regards,
Heinz


------------------------------------------------------------------------------
All the data continuously generated in your IT infrastructure contains a
definitive record of customers, application performance, security
threats, fraudulent activity and more. Splunk takes this data and makes
sense of it. Business sense. IT sense. Common sense.
http://p.sf.net/sfu/splunk-d2dcopy1
_______________________________________________
Gramps-devel mailing list
[hidden email]
https://lists.sourceforge.net/lists/listinfo/gramps-devel
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

Re: Database compare and merge

derHeinzi
In reply to this post by DS Blank

Doug,

you are right that a UID might help for comparison of two database instances that are based on the same root db and branched from there (e.g. a relative working on a copy). But a compare and merge tool should also be able to handle e.g. a GEDCOM import into an empty database as a compared database. For that a UID would give no benefit.

I used my script for a compare between the Gramps example database of version 3.2 and 3.3 and found a lot of changed name tags in there. Also for an imported database some information might be stored with a different tag. Therefore (in the final version) the tool should have something like a tag dictionary in which you can say: Compare data in tag 'last' from my db to data in tag 'surname' in the compared db.

Just to add this to the 'things to do' for a database compare and merge (or a difference report).

Kind regards,
Heinz

--- Doug Blank <[hidden email]> schrieb am So, 25.9.2011:

> Von: Doug Blank <[hidden email]>
> Betreff: Re: [Gramps-devel] Database compare and merge
> An: "Ken" <[hidden email]>
> CC: [hidden email]
> Datum: Sonntag, 25. September, 2011 23:21 Uhr
> On Sun, Sep 25, 2011 at 4:27 PM, Ken
> <[hidden email]>
> wrote:
> > Hello All,
> >
> >     Thank you Heinz, great work here.
> >     This looks like a great start towards something
> that is desperately
> > needed in Gramps.
> >     For those of you lucky enough to be able to
> write the code I do have a
> > problem with using the Id's as a method of matching.
> >     I share data with a relative I've also got to
> use Gramps. Recently I
> > sent a database and they had added people, events and
> notes and returned a
> > Gramps file to me. After imported their database into
> a backup of my databse
> > a lot of people, events and notes had become duplicate
> enteries. On doing
> > some investigating I found they had run the [Tools
> > Family Tree Processing
> >> Reorder Gramps ID's ] tool. This had allocated new
> ID's to people, events,
> > notes etc.
> >     May I suggest that the comparison be between a
> persons name, date of
> > birth, date of death, events, notes, sources etc. This
> would also be needed
> > when comparing events, sources and notes etc.
> >     I do understand that this would be a long and
> probably slow process, so
> > maybe this type of comparison could be an option.
> >     I would give a much better merge, and save the
> end user a lot of time
> > fixing the database. The way I see such a tool is to
> do just that, saving
> > work and time.
>
> Agreed that sometimes this might take some time, but that
> is ok.
>
> In your example, the issue could almost be avoided
> completely with a
> UID as outlined in [1]. Thus, even if a user changed some
> important
> information (like name or ID), the UID would still be
> retained and
> could be used to match. A nice UI could allow one to ignore
> those
> minor/irrelevant differences (or include them), as found by
> a tool
> like Heinz's.
>
> I think this is the time to keep a set of UIDs for each
> person.
>
> -Doug
>
> [1] http://www.gramps-project.org/wiki/index.php?title=GEPS_009:_Import_Export_Merge
>
> > Kind Regards,
> > Ken Benseman.
> > New Zealand.
> >
> > On 22/09/11 21:47, derHeinzi wrote:
> >
> > Hello Developers,
> > this is quite a long message to a difficult matter. So
> bear with me.
> >
> > Please find attached a python script for comparing
> data in 2 Gramps xml
> > files.
> > http://gramps.1791082.n4.nabble.com/file/n3832887/GrampsCompare.py
> > GrampsCompare.py
> > The comparison is done in both databases starting with
> the "same" person,
> > which you have to specify.
> > For test:
> > - Create gramps xml-files by unzipping two gramps
> archives.
> > - Find IDs of the same "key"-person in both files.
> > - Start script with parameters firstFile, firstID,
> secondFile, secondID
> > - Output is written to screen. You might want to
> redirect to file.
> > It is not (yet) a tool to compare entries in 2
> different databases but you
> > can already find the changes that have been done to a
> database you shared
> > with some other person or to a backup you did some
> time ago.
> >
> > I'm currently working with Gramps 3.2.5-1 on WinXP and
> used xml-files from
> > this version for program development and test. But as
> long as the attributes
> > "id" "handle" and "hlink" are in the xml it should
> work for other versions
> > as well.
> >
> > My intention was to find a possible way to a database
> compare and merge in
> > Gramps. Following the devs and users mailing lists for
> quite a while now
> > this matter came up from time to time, but I found no
> concept of solution
> > discussed and no hint that someone is working on this
> right now.
> > GrampsConnect was mentioned in some posts, but I did
> not check if there is a
> > concept or solution there yet.
> >
> > After hacking a quick database compare script which
> took several hours to
> > complete a run on my database with about 3000 people,
> I tweaked the script
> > to now finish in less than a minute. This should be an
> amount of time that a
> > user could accept for a complicated function to
> finish?
> > Since I'm fairly new to Python and using this script
> as a way to learn the
> > language there might be even better ways to do things.
> But hey, I'm proud of
> > what I achieved in these few evenings! :-)
> > I added tons of comments to the code to make you guys
> understand what the
> > script is doing! So have fun with it. I don't claim
> any (copy)rights.
> >
> > Now, what could a database compare and merge look like
> in the gui and what
> > is still do be done.
> >
> > First to the GUI of a compare and merge. If you look
> at the compare and
> > merge window for a person in Gramps you see the person
> and related info side
> > by side.
> > This could be changed to a display as shown in
> attached cmpwin.png.
> > http://gramps.1791082.n4.nabble.com/file/n3832887/cmpwin.png
> cmpwin.png
> > The changes are:
> > - For every subnode type (tag) in the database there
> is a "section" in the
> > window. (For person eg. gender, names, ...
> > - Only subnodes without handles are displayed for
> comparison.
> > - All subnodes referring to other nodes with handles
> are shown in two lists
> > (If a compare by script was performed beforehand the
> list entrys might show
> > different colors for identical, changed and missing
> references.) The first
> > list contains the items that "match" items in the
> other database, the second
> > one shows the items that do not match or could match
> more than one items in
> > the other database.
> > - There is a means to "link" nodes from both
> databases. (The button with the
> > "=" between the lower lists. The broken or unbroken
> chain symbol on a button
> > would be more appropriate.) If you see the 2 marriage
> entries in this
> > example on the right side, you have to decide which of
> these is already in
> > the left database. So you select first the left and
> then one of the right
> > marriage events to see the data in the quickview. If
> you find them to refer
> > to the same marriage, you select both and press the
> "=" button to link these
> > 2 entries. They will be moved to the lists above
> (matched information).
> > - If you doubleclick (there could be a button for
> this) an entry in the
> > matched information list, the content of the window is
> replaced by the
> > content of the selected node (e.g. the family from the
> childof reference).
> > There could be a "back" or even a "history" button for
> navigation like in
> > browsers.
> > - With the "+" button you add information from the 2nd
> database, with "<"
> > you replace information.
> > - The window looks and works the same for all
> comparisons, no matter if
> > events, persons, families, ... are compared.
> >
> > What do you think of this GUI concept (not design!,
> thats far from nice)? Do
> > you think this could be a way to handle data from 2
> databases?
> >
> > Now to the question what still has to be done.
> > - The standalone script has first to be improved and
> in the end to be
> > integrated into Gramps.
> > - The comparison of the data nodes currently does not
> have a "closer look"
> > at the data. The data itself has to be taken into
> account. Eg. currently
> > only a check attributesDB1 == attributesDB2 is done.
> This should ignore the
> > 'change' attribute.
> > - In the end this function must not rely on same IDs
> or handles but has to
> > check the data itself. This might get a bit
> complicated, but I think with
> > the approach that you only have to check the data
> referred to by already
> > matched nodes it can be solved and handled. Even for
> comparison with a
> > Gramps xml from an imported GEDCOM.
> > - It might be useful to generate an output file with
> the found differences
> > for further evaluation in Gramps or other software?
> >
> > My excuses for this long post. I hope it was worth
> reading and clear enough
> > to make you understand what I try to say. My mother
> tongue is not English
> > which makes it a little difficult to explain
> complicated things.
> >
> > Kind regards and have fun
> > Heinz
> >
> >
> > --
> > View this message in context:
> > http://gramps.1791082.n4.nabble.com/Database-compare-and-merge-tp3832887p3832887.html
> > Sent from the GRAMPS - Dev mailing list archive at
> Nabble.com.
> >
> >
> ------------------------------------------------------------------------------
> > All the data continuously generated in your IT
> infrastructure contains a
> > definitive record of customers, application
> performance, security
> > threats, fraudulent activity and more. Splunk takes
> this data and makes
> > sense of it. Business sense. IT sense. Common sense.
> > http://p.sf.net/sfu/splunk-d2dcopy1
> > _______________________________________________
> > Gramps-devel mailing list
> > [hidden email]
> > https://lists.sourceforge.net/lists/listinfo/gramps-devel
> >
> >
> >
> ------------------------------------------------------------------------------
> > All of the data generated in your IT infrastructure is
> seriously valuable.
> > Why? It contains a definitive record of application
> performance, security
> > threats, fraudulent activity, and more. Splunk takes
> this data and makes
> > sense of it. IT sense. And common sense.
> > http://p.sf.net/sfu/splunk-d2dcopy2
> > _______________________________________________
> > Gramps-devel mailing list
> > [hidden email]
> > https://lists.sourceforge.net/lists/listinfo/gramps-devel
> >
> >
>
> ------------------------------------------------------------------------------
> All of the data generated in your IT infrastructure is
> seriously valuable.
> Why? It contains a definitive record of application
> performance, security
> threats, fraudulent activity, and more. Splunk takes this
> data and makes
> sense of it. IT sense. And common sense.
> http://p.sf.net/sfu/splunk-d2dcopy2
> _______________________________________________
> Gramps-devel mailing list
> [hidden email]
> https://lists.sourceforge.net/lists/listinfo/gramps-devel
>

------------------------------------------------------------------------------
All the data continuously generated in your IT infrastructure contains a
definitive record of customers, application performance, security
threats, fraudulent activity and more. Splunk takes this data and makes
sense of it. Business sense. IT sense. Common sense.
http://p.sf.net/sfu/splunk-d2dcopy1
_______________________________________________
Gramps-devel mailing list
[hidden email]
https://lists.sourceforge.net/lists/listinfo/gramps-devel
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

Re: Database compare and merge

jerome
In reply to this post by derHeinzi
Hi Heinz,

>> Funny, I am also working on an experimental way to parse a
>> Gramps XML file. :)
> you are referring to the display as XSLT? I saw that discussion on the list.

Yes, but also the parsing way after having XML in memory.

> as a beginner in Python I had to find a solution for this

I am far away to be a python expert or coding wizard.
And I am not able to translate your complete code into my custom logic,
but I agree that something for making our Gramps XML more "accessible"
could make the collaborative work between Gramps users more active. I
would love to see a way (online, offline) for family members working
together on a common genealogical content. ;)

> It seems you did not read the next sentence? The runtime had nothing to do with the xml access. I've got a database with about 3000 persons and there are about 75000 elements (nodes) to be handled. So the looping technique

I have seen your multiple "parent/child" node tests.
I just think that your are using DOM technique with ElementTree, as
'lxml' written in C is more faster, if you use less loop, you may have a
other gain in performances ! Instead of the looping technique (I also
often use it...), why not direct XPath matching ???

>> to iter/go to the greatchild level, then to call something like
>> "handle.items()".
>
> No. That might work to compare two snapshots of the same database, but my (final) taget is a compare tool that also works on completely different database contents.

Well, this is what I use before a my database migrations
(comparison: content should be the same ...)
1. a complete Gramps XML parsing
2. then generate one html file with the content
3. finaly used 'diff' tool

Agreed, a quick and dirty method, but I get rid of 'non-idempotent XML
file after an import/export' issue by using a common HTML model. Michiel
shared a way for sorting records by IDs and I guess the result may be
the same as your script ?

In fact in this type of hierarchical flat database, I think that
comparison between all attributes (hlink, handles, "date modification")
under some levels could give a clue about any database change.
ie. data synchronization between two differents authors on the same
database.


Thanks!
Jérôme






Heinz Brinker a écrit :

> Hi Jerome,
>
> with
>> Funny, I am also working on an experimental way to parse a
>> Gramps XML file. :)
> you are referring to the display as XSLT? I saw that discussion on the list.
>
>> When you say that you need "several hours to
>> complete a run on database with about 3000 people", I think
>> you should really try 'lxml' !
>
> It seems you did not read the next sentence? The runtime had nothing to do with the xml access. I've got a database with about 3000 persons and there are about 75000 elements (nodes) to be handled. So the looping technique plays a great role here and as a beginner in Python I had to find a solution for this. And I did. :-)
>
>> Not tested, but at glance what you are trying to do might
>> be done with few lines with 'etree' from 'lxml': to iter/go
>> to the greatchild level, then to call something like
>> "handle.items()".
>
> No. That might work to compare two snapshots of the same database, but my (final) taget is a compare tool that also works on completely different database contents.
>
>> I added comments on this experimental addon for parsing XML
>> via 'lxml':
>
> I'll have a look at 'lxml' when time permits. Thank you for the link!
>
>> There is no package release because it is not
>> finished/polished and do not work on all OS. I try to make
>> some comments, the improvements should be simplier. Feel
>> free to have a look, improve, modify, reuse it ! ;)
>
> Thank you, Jerome.
>
> Kind regards,
> Heinz
>
>
> ------------------------------------------------------------------------------
> All the data continuously generated in your IT infrastructure contains a
> definitive record of customers, application performance, security
> threats, fraudulent activity and more. Splunk takes this data and makes
> sense of it. Business sense. IT sense. Common sense.
> http://p.sf.net/sfu/splunk-d2dcopy1
> _______________________________________________
> Gramps-devel mailing list
> [hidden email]
> https://lists.sourceforge.net/lists/listinfo/gramps-devel
>


------------------------------------------------------------------------------
All the data continuously generated in your IT infrastructure contains a
definitive record of customers, application performance, security
threats, fraudulent activity and more. Splunk takes this data and makes
sense of it. Business sense. IT sense. Common sense.
http://p.sf.net/sfu/splunk-d2dcopy1
_______________________________________________
Gramps-devel mailing list
[hidden email]
https://lists.sourceforge.net/lists/listinfo/gramps-devel
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

Re: Database compare and merge

jerome
> In fact in this type of hierarchical flat database, I think that
> comparison between all attributes (hlink, handles, "date modification")
> under some levels could give a clue about any database change.
> ie. data synchronization between two differents authors on the same
> database.

arrghh, incomplete description ... (multiple way when we translate above
sentences). I need to also make clear that I agree with Doug and what
you said! :)

In my mind, data synchronization means to merge two different data after
comparison. ;)



Jérôme a écrit :

> Hi Heinz,
>
>>> Funny, I am also working on an experimental way to parse a
>>> Gramps XML file. :)
>> you are referring to the display as XSLT? I saw that discussion on the list.
>
> Yes, but also the parsing way after having XML in memory.
>
>> as a beginner in Python I had to find a solution for this
>
> I am far away to be a python expert or coding wizard.
> And I am not able to translate your complete code into my custom logic,
> but I agree that something for making our Gramps XML more "accessible"
> could make the collaborative work between Gramps users more active. I
> would love to see a way (online, offline) for family members working
> together on a common genealogical content. ;)
>
>> It seems you did not read the next sentence? The runtime had nothing to do with the xml access. I've got a database with about 3000 persons and there are about 75000 elements (nodes) to be handled. So the looping technique
>
> I have seen your multiple "parent/child" node tests.
> I just think that your are using DOM technique with ElementTree, as
> 'lxml' written in C is more faster, if you use less loop, you may have a
> other gain in performances ! Instead of the looping technique (I also
> often use it...), why not direct XPath matching ???
>
>>> to iter/go to the greatchild level, then to call something like
>>> "handle.items()".
>> No. That might work to compare two snapshots of the same database, but my (final) taget is a compare tool that also works on completely different database contents.
>
> Well, this is what I use before a my database migrations
> (comparison: content should be the same ...)
> 1. a complete Gramps XML parsing
> 2. then generate one html file with the content
> 3. finaly used 'diff' tool
>
> Agreed, a quick and dirty method, but I get rid of 'non-idempotent XML
> file after an import/export' issue by using a common HTML model. Michiel
> shared a way for sorting records by IDs and I guess the result may be
> the same as your script ?
>
> In fact in this type of hierarchical flat database, I think that
> comparison between all attributes (hlink, handles, "date modification")
> under some levels could give a clue about any database change.
> ie. data synchronization between two differents authors on the same
> database.
>
>
> Thanks!
> Jérôme
>
>
>
>
>
>
> Heinz Brinker a écrit :
>> Hi Jerome,
>>
>> with
>>> Funny, I am also working on an experimental way to parse a
>>> Gramps XML file. :)
>> you are referring to the display as XSLT? I saw that discussion on the list.
>>
>>> When you say that you need "several hours to
>>> complete a run on database with about 3000 people", I think
>>> you should really try 'lxml' !
>> It seems you did not read the next sentence? The runtime had nothing to do with the xml access. I've got a database with about 3000 persons and there are about 75000 elements (nodes) to be handled. So the looping technique plays a great role here and as a beginner in Python I had to find a solution for this. And I did. :-)
>>
>>> Not tested, but at glance what you are trying to do might
>>> be done with few lines with 'etree' from 'lxml': to iter/go
>>> to the greatchild level, then to call something like
>>> "handle.items()".
>> No. That might work to compare two snapshots of the same database, but my (final) taget is a compare tool that also works on completely different database contents.
>>
>>> I added comments on this experimental addon for parsing XML
>>> via 'lxml':
>> I'll have a look at 'lxml' when time permits. Thank you for the link!
>>
>>> There is no package release because it is not
>>> finished/polished and do not work on all OS. I try to make
>>> some comments, the improvements should be simplier. Feel
>>> free to have a look, improve, modify, reuse it ! ;)
>> Thank you, Jerome.
>>
>> Kind regards,
>> Heinz
>>
>>
>> ------------------------------------------------------------------------------
>> All the data continuously generated in your IT infrastructure contains a
>> definitive record of customers, application performance, security
>> threats, fraudulent activity and more. Splunk takes this data and makes
>> sense of it. Business sense. IT sense. Common sense.
>> http://p.sf.net/sfu/splunk-d2dcopy1
>> _______________________________________________
>> Gramps-devel mailing list
>> [hidden email]
>> https://lists.sourceforge.net/lists/listinfo/gramps-devel
>>
>
>
> ------------------------------------------------------------------------------
> All the data continuously generated in your IT infrastructure contains a
> definitive record of customers, application performance, security
> threats, fraudulent activity and more. Splunk takes this data and makes
> sense of it. Business sense. IT sense. Common sense.
> http://p.sf.net/sfu/splunk-d2dcopy1
> _______________________________________________
> Gramps-devel mailing list
> [hidden email]
> https://lists.sourceforge.net/lists/listinfo/gramps-devel


------------------------------------------------------------------------------
All the data continuously generated in your IT infrastructure contains a
definitive record of customers, application performance, security
threats, fraudulent activity and more. Splunk takes this data and makes
sense of it. Business sense. IT sense. Common sense.
http://p.sf.net/sfu/splunk-d2dcopy1
_______________________________________________
Gramps-devel mailing list
[hidden email]
https://lists.sourceforge.net/lists/listinfo/gramps-devel
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

Re: Database compare and merge

jerome
In reply to this post by DS Blank
> I think this is the time to keep a set of UIDs for each
> person.
>
> [1] http://www.gramps-project.org/wiki/index.php?title=GEPS_009:_Import_Export_Merge

Why only on person object ?

I suppose that only UID on person also means one of possible duplicated event will go to the garbage and some family issues...

Is it not more "simple" to also provide UIDs on event and family objects ?

Places, sources, notes, media objects or repositories do not need them if they are related to one of the above objects.


Jérôme



--- En date de : Dim 25.9.11, Doug Blank <[hidden email]> a écrit :

> De: Doug Blank <[hidden email]>
> Objet: Re: [Gramps-devel] Database compare and merge
> À: "Ken" <[hidden email]>
> Cc: [hidden email]
> Date: Dimanche 25 septembre 2011, 23h21
> On Sun, Sep 25, 2011 at 4:27 PM, Ken
> <[hidden email]>
> wrote:
> > Hello All,
> >
> >     Thank you Heinz, great work here.
> >     This looks like a great start towards something
> that is desperately
> > needed in Gramps.
> >     For those of you lucky enough to be able to
> write the code I do have a
> > problem with using the Id's as a method of matching.
> >     I share data with a relative I've also got to
> use Gramps. Recently I
> > sent a database and they had added people, events and
> notes and returned a
> > Gramps file to me. After imported their database into
> a backup of my databse
> > a lot of people, events and notes had become duplicate
> enteries. On doing
> > some investigating I found they had run the [Tools
> > Family Tree Processing
> >> Reorder Gramps ID's ] tool. This had allocated new
> ID's to people, events,
> > notes etc.
> >     May I suggest that the comparison be between a
> persons name, date of
> > birth, date of death, events, notes, sources etc. This
> would also be needed
> > when comparing events, sources and notes etc.
> >     I do understand that this would be a long and
> probably slow process, so
> > maybe this type of comparison could be an option.
> >     I would give a much better merge, and save the
> end user a lot of time
> > fixing the database. The way I see such a tool is to
> do just that, saving
> > work and time.
>
> Agreed that sometimes this might take some time, but that
> is ok.
>
> In your example, the issue could almost be avoided
> completely with a
> UID as outlined in [1]. Thus, even if a user changed some
> important
> information (like name or ID), the UID would still be
> retained and
> could be used to match. A nice UI could allow one to ignore
> those
> minor/irrelevant differences (or include them), as found by
> a tool
> like Heinz's.
>
> I think this is the time to keep a set of UIDs for each
> person.
>
> -Doug
>
> [1] http://www.gramps-project.org/wiki/index.php?title=GEPS_009:_Import_Export_Merge
>
> > Kind Regards,
> > Ken Benseman.
> > New Zealand.
> >
> > On 22/09/11 21:47, derHeinzi wrote:
> >
> > Hello Developers,
> > this is quite a long message to a difficult matter. So
> bear with me.
> >
> > Please find attached a python script for comparing
> data in 2 Gramps xml
> > files.
> > http://gramps.1791082.n4.nabble.com/file/n3832887/GrampsCompare.py
> > GrampsCompare.py
> > The comparison is done in both databases starting with
> the "same" person,
> > which you have to specify.
> > For test:
> > - Create gramps xml-files by unzipping two gramps
> archives.
> > - Find IDs of the same "key"-person in both files.
> > - Start script with parameters firstFile, firstID,
> secondFile, secondID
> > - Output is written to screen. You might want to
> redirect to file.
> > It is not (yet) a tool to compare entries in 2
> different databases but you
> > can already find the changes that have been done to a
> database you shared
> > with some other person or to a backup you did some
> time ago.
> >
> > I'm currently working with Gramps 3.2.5-1 on WinXP and
> used xml-files from
> > this version for program development and test. But as
> long as the attributes
> > "id" "handle" and "hlink" are in the xml it should
> work for other versions
> > as well.
> >
> > My intention was to find a possible way to a database
> compare and merge in
> > Gramps. Following the devs and users mailing lists for
> quite a while now
> > this matter came up from time to time, but I found no
> concept of solution
> > discussed and no hint that someone is working on this
> right now.
> > GrampsConnect was mentioned in some posts, but I did
> not check if there is a
> > concept or solution there yet.
> >
> > After hacking a quick database compare script which
> took several hours to
> > complete a run on my database with about 3000 people,
> I tweaked the script
> > to now finish in less than a minute. This should be an
> amount of time that a
> > user could accept for a complicated function to
> finish?
> > Since I'm fairly new to Python and using this script
> as a way to learn the
> > language there might be even better ways to do things.
> But hey, I'm proud of
> > what I achieved in these few evenings! :-)
> > I added tons of comments to the code to make you guys
> understand what the
> > script is doing! So have fun with it. I don't claim
> any (copy)rights.
> >
> > Now, what could a database compare and merge look like
> in the gui and what
> > is still do be done.
> >
> > First to the GUI of a compare and merge. If you look
> at the compare and
> > merge window for a person in Gramps you see the person
> and related info side
> > by side.
> > This could be changed to a display as shown in
> attached cmpwin.png.
> > http://gramps.1791082.n4.nabble.com/file/n3832887/cmpwin.png
> cmpwin.png
> > The changes are:
> > - For every subnode type (tag) in the database there
> is a "section" in the
> > window. (For person eg. gender, names, ...
> > - Only subnodes without handles are displayed for
> comparison.
> > - All subnodes referring to other nodes with handles
> are shown in two lists
> > (If a compare by script was performed beforehand the
> list entrys might show
> > different colors for identical, changed and missing
> references.) The first
> > list contains the items that "match" items in the
> other database, the second
> > one shows the items that do not match or could match
> more than one items in
> > the other database.
> > - There is a means to "link" nodes from both
> databases. (The button with the
> > "=" between the lower lists. The broken or unbroken
> chain symbol on a button
> > would be more appropriate.) If you see the 2 marriage
> entries in this
> > example on the right side, you have to decide which of
> these is already in
> > the left database. So you select first the left and
> then one of the right
> > marriage events to see the data in the quickview. If
> you find them to refer
> > to the same marriage, you select both and press the
> "=" button to link these
> > 2 entries. They will be moved to the lists above
> (matched information).
> > - If you doubleclick (there could be a button for
> this) an entry in the
> > matched information list, the content of the window is
> replaced by the
> > content of the selected node (e.g. the family from the
> childof reference).
> > There could be a "back" or even a "history" button for
> navigation like in
> > browsers.
> > - With the "+" button you add information from the 2nd
> database, with "<"
> > you replace information.
> > - The window looks and works the same for all
> comparisons, no matter if
> > events, persons, families, ... are compared.
> >
> > What do you think of this GUI concept (not design!,
> thats far from nice)? Do
> > you think this could be a way to handle data from 2
> databases?
> >
> > Now to the question what still has to be done.
> > - The standalone script has first to be improved and
> in the end to be
> > integrated into Gramps.
> > - The comparison of the data nodes currently does not
> have a "closer look"
> > at the data. The data itself has to be taken into
> account. Eg. currently
> > only a check attributesDB1 == attributesDB2 is done.
> This should ignore the
> > 'change' attribute.
> > - In the end this function must not rely on same IDs
> or handles but has to
> > check the data itself. This might get a bit
> complicated, but I think with
> > the approach that you only have to check the data
> referred to by already
> > matched nodes it can be solved and handled. Even for
> comparison with a
> > Gramps xml from an imported GEDCOM.
> > - It might be useful to generate an output file with
> the found differences
> > for further evaluation in Gramps or other software?
> >
> > My excuses for this long post. I hope it was worth
> reading and clear enough
> > to make you understand what I try to say. My mother
> tongue is not English
> > which makes it a little difficult to explain
> complicated things.
> >
> > Kind regards and have fun
> > Heinz
> >
> >
> > --
> > View this message in context:
> > http://gramps.1791082.n4.nabble.com/Database-compare-and-merge-tp3832887p3832887.html
> > Sent from the GRAMPS - Dev mailing list archive at
> Nabble.com.
> >
> >
> ------------------------------------------------------------------------------
> > All the data continuously generated in your IT
> infrastructure contains a
> > definitive record of customers, application
> performance, security
> > threats, fraudulent activity and more. Splunk takes
> this data and makes
> > sense of it. Business sense. IT sense. Common sense.
> > http://p.sf.net/sfu/splunk-d2dcopy1
> > _______________________________________________
> > Gramps-devel mailing list
> > [hidden email]
> > https://lists.sourceforge.net/lists/listinfo/gramps-devel
> >
> >
> >
> ------------------------------------------------------------------------------
> > All of the data generated in your IT infrastructure is
> seriously valuable.
> > Why? It contains a definitive record of application
> performance, security
> > threats, fraudulent activity, and more. Splunk takes
> this data and makes
> > sense of it. IT sense. And common sense.
> > http://p.sf.net/sfu/splunk-d2dcopy2
> > _______________________________________________
> > Gramps-devel mailing list
> > [hidden email]
> > https://lists.sourceforge.net/lists/listinfo/gramps-devel
> >
> >
>
> ------------------------------------------------------------------------------
> All of the data generated in your IT infrastructure is
> seriously valuable.
> Why? It contains a definitive record of application
> performance, security
> threats, fraudulent activity, and more. Splunk takes this
> data and makes
> sense of it. IT sense. And common sense.
> http://p.sf.net/sfu/splunk-d2dcopy2
> _______________________________________________
> Gramps-devel mailing list
> [hidden email]
> https://lists.sourceforge.net/lists/listinfo/gramps-devel
>

------------------------------------------------------------------------------
All the data continuously generated in your IT infrastructure contains a
definitive record of customers, application performance, security
threats, fraudulent activity and more. Splunk takes this data and makes
sense of it. Business sense. IT sense. Common sense.
http://p.sf.net/sfu/splunk-d2dcopy1
_______________________________________________
Gramps-devel mailing list
[hidden email]
https://lists.sourceforge.net/lists/listinfo/gramps-devel
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

Re: Database compare and merge

DS Blank
On Mon, Sep 26, 2011 at 11:10 AM, jerome <[hidden email]> wrote:
>> I think this is the time to keep a set of UIDs for each
>> person.
>>
>> [1] http://www.gramps-project.org/wiki/index.php?title=GEPS_009:_Import_Export_Merge
>
> Why only on person object ?

Yes, you are correct: I meant for each object.

-Doug

> I suppose that only UID on person also means one of possible duplicated event will go to the garbage and some family issues...
>
> Is it not more "simple" to also provide UIDs on event and family objects ?
>
> Places, sources, notes, media objects or repositories do not need them if they are related to one of the above objects.
>
>
> Jérôme
>
>
>
> --- En date de : Dim 25.9.11, Doug Blank <[hidden email]> a écrit :
>
>> De: Doug Blank <[hidden email]>
>> Objet: Re: [Gramps-devel] Database compare and merge
>> À: "Ken" <[hidden email]>
>> Cc: [hidden email]
>> Date: Dimanche 25 septembre 2011, 23h21
>> On Sun, Sep 25, 2011 at 4:27 PM, Ken
>> <[hidden email]>
>> wrote:
>> > Hello All,
>> >
>> >     Thank you Heinz, great work here.
>> >     This looks like a great start towards something
>> that is desperately
>> > needed in Gramps.
>> >     For those of you lucky enough to be able to
>> write the code I do have a
>> > problem with using the Id's as a method of matching.
>> >     I share data with a relative I've also got to
>> use Gramps. Recently I
>> > sent a database and they had added people, events and
>> notes and returned a
>> > Gramps file to me. After imported their database into
>> a backup of my databse
>> > a lot of people, events and notes had become duplicate
>> enteries. On doing
>> > some investigating I found they had run the [Tools
>> > Family Tree Processing
>> >> Reorder Gramps ID's ] tool. This had allocated new
>> ID's to people, events,
>> > notes etc.
>> >     May I suggest that the comparison be between a
>> persons name, date of
>> > birth, date of death, events, notes, sources etc. This
>> would also be needed
>> > when comparing events, sources and notes etc.
>> >     I do understand that this would be a long and
>> probably slow process, so
>> > maybe this type of comparison could be an option.
>> >     I would give a much better merge, and save the
>> end user a lot of time
>> > fixing the database. The way I see such a tool is to
>> do just that, saving
>> > work and time.
>>
>> Agreed that sometimes this might take some time, but that
>> is ok.
>>
>> In your example, the issue could almost be avoided
>> completely with a
>> UID as outlined in [1]. Thus, even if a user changed some
>> important
>> information (like name or ID), the UID would still be
>> retained and
>> could be used to match. A nice UI could allow one to ignore
>> those
>> minor/irrelevant differences (or include them), as found by
>> a tool
>> like Heinz's.
>>
>> I think this is the time to keep a set of UIDs for each
>> person.
>>
>> -Doug
>>
>> [1] http://www.gramps-project.org/wiki/index.php?title=GEPS_009:_Import_Export_Merge
>>
>> > Kind Regards,
>> > Ken Benseman.
>> > New Zealand.
>> >
>> > On 22/09/11 21:47, derHeinzi wrote:
>> >
>> > Hello Developers,
>> > this is quite a long message to a difficult matter. So
>> bear with me.
>> >
>> > Please find attached a python script for comparing
>> data in 2 Gramps xml
>> > files.
>> > http://gramps.1791082.n4.nabble.com/file/n3832887/GrampsCompare.py
>> > GrampsCompare.py
>> > The comparison is done in both databases starting with
>> the "same" person,
>> > which you have to specify.
>> > For test:
>> > - Create gramps xml-files by unzipping two gramps
>> archives.
>> > - Find IDs of the same "key"-person in both files.
>> > - Start script with parameters firstFile, firstID,
>> secondFile, secondID
>> > - Output is written to screen. You might want to
>> redirect to file.
>> > It is not (yet) a tool to compare entries in 2
>> different databases but you
>> > can already find the changes that have been done to a
>> database you shared
>> > with some other person or to a backup you did some
>> time ago.
>> >
>> > I'm currently working with Gramps 3.2.5-1 on WinXP and
>> used xml-files from
>> > this version for program development and test. But as
>> long as the attributes
>> > "id" "handle" and "hlink" are in the xml it should
>> work for other versions
>> > as well.
>> >
>> > My intention was to find a possible way to a database
>> compare and merge in
>> > Gramps. Following the devs and users mailing lists for
>> quite a while now
>> > this matter came up from time to time, but I found no
>> concept of solution
>> > discussed and no hint that someone is working on this
>> right now.
>> > GrampsConnect was mentioned in some posts, but I did
>> not check if there is a
>> > concept or solution there yet.
>> >
>> > After hacking a quick database compare script which
>> took several hours to
>> > complete a run on my database with about 3000 people,
>> I tweaked the script
>> > to now finish in less than a minute. This should be an
>> amount of time that a
>> > user could accept for a complicated function to
>> finish?
>> > Since I'm fairly new to Python and using this script
>> as a way to learn the
>> > language there might be even better ways to do things.
>> But hey, I'm proud of
>> > what I achieved in these few evenings! :-)
>> > I added tons of comments to the code to make you guys
>> understand what the
>> > script is doing! So have fun with it. I don't claim
>> any (copy)rights.
>> >
>> > Now, what could a database compare and merge look like
>> in the gui and what
>> > is still do be done.
>> >
>> > First to the GUI of a compare and merge. If you look
>> at the compare and
>> > merge window for a person in Gramps you see the person
>> and related info side
>> > by side.
>> > This could be changed to a display as shown in
>> attached cmpwin.png.
>> > http://gramps.1791082.n4.nabble.com/file/n3832887/cmpwin.png
>> cmpwin.png
>> > The changes are:
>> > - For every subnode type (tag) in the database there
>> is a "section" in the
>> > window. (For person eg. gender, names, ...
>> > - Only subnodes without handles are displayed for
>> comparison.
>> > - All subnodes referring to other nodes with handles
>> are shown in two lists
>> > (If a compare by script was performed beforehand the
>> list entrys might show
>> > different colors for identical, changed and missing
>> references.) The first
>> > list contains the items that "match" items in the
>> other database, the second
>> > one shows the items that do not match or could match
>> more than one items in
>> > the other database.
>> > - There is a means to "link" nodes from both
>> databases. (The button with the
>> > "=" between the lower lists. The broken or unbroken
>> chain symbol on a button
>> > would be more appropriate.) If you see the 2 marriage
>> entries in this
>> > example on the right side, you have to decide which of
>> these is already in
>> > the left database. So you select first the left and
>> then one of the right
>> > marriage events to see the data in the quickview. If
>> you find them to refer
>> > to the same marriage, you select both and press the
>> "=" button to link these
>> > 2 entries. They will be moved to the lists above
>> (matched information).
>> > - If you doubleclick (there could be a button for
>> this) an entry in the
>> > matched information list, the content of the window is
>> replaced by the
>> > content of the selected node (e.g. the family from the
>> childof reference).
>> > There could be a "back" or even a "history" button for
>> navigation like in
>> > browsers.
>> > - With the "+" button you add information from the 2nd
>> database, with "<"
>> > you replace information.
>> > - The window looks and works the same for all
>> comparisons, no matter if
>> > events, persons, families, ... are compared.
>> >
>> > What do you think of this GUI concept (not design!,
>> thats far from nice)? Do
>> > you think this could be a way to handle data from 2
>> databases?
>> >
>> > Now to the question what still has to be done.
>> > - The standalone script has first to be improved and
>> in the end to be
>> > integrated into Gramps.
>> > - The comparison of the data nodes currently does not
>> have a "closer look"
>> > at the data. The data itself has to be taken into
>> account. Eg. currently
>> > only a check attributesDB1 == attributesDB2 is done.
>> This should ignore the
>> > 'change' attribute.
>> > - In the end this function must not rely on same IDs
>> or handles but has to
>> > check the data itself. This might get a bit
>> complicated, but I think with
>> > the approach that you only have to check the data
>> referred to by already
>> > matched nodes it can be solved and handled. Even for
>> comparison with a
>> > Gramps xml from an imported GEDCOM.
>> > - It might be useful to generate an output file with
>> the found differences
>> > for further evaluation in Gramps or other software?
>> >
>> > My excuses for this long post. I hope it was worth
>> reading and clear enough
>> > to make you understand what I try to say. My mother
>> tongue is not English
>> > which makes it a little difficult to explain
>> complicated things.
>> >
>> > Kind regards and have fun
>> > Heinz
>> >
>> >
>> > --
>> > View this message in context:
>> > http://gramps.1791082.n4.nabble.com/Database-compare-and-merge-tp3832887p3832887.html
>> > Sent from the GRAMPS - Dev mailing list archive at
>> Nabble.com.
>> >
>> >
>> ------------------------------------------------------------------------------
>> > All the data continuously generated in your IT
>> infrastructure contains a
>> > definitive record of customers, application
>> performance, security
>> > threats, fraudulent activity and more. Splunk takes
>> this data and makes
>> > sense of it. Business sense. IT sense. Common sense.
>> > http://p.sf.net/sfu/splunk-d2dcopy1
>> > _______________________________________________
>> > Gramps-devel mailing list
>> > [hidden email]
>> > https://lists.sourceforge.net/lists/listinfo/gramps-devel
>> >
>> >
>> >
>> ------------------------------------------------------------------------------
>> > All of the data generated in your IT infrastructure is
>> seriously valuable.
>> > Why? It contains a definitive record of application
>> performance, security
>> > threats, fraudulent activity, and more. Splunk takes
>> this data and makes
>> > sense of it. IT sense. And common sense.
>> > http://p.sf.net/sfu/splunk-d2dcopy2
>> > _______________________________________________
>> > Gramps-devel mailing list
>> > [hidden email]
>> > https://lists.sourceforge.net/lists/listinfo/gramps-devel
>> >
>> >
>>
>> ------------------------------------------------------------------------------
>> All of the data generated in your IT infrastructure is
>> seriously valuable.
>> Why? It contains a definitive record of application
>> performance, security
>> threats, fraudulent activity, and more. Splunk takes this
>> data and makes
>> sense of it. IT sense. And common sense.
>> http://p.sf.net/sfu/splunk-d2dcopy2
>> _______________________________________________
>> Gramps-devel mailing list
>> [hidden email]
>> https://lists.sourceforge.net/lists/listinfo/gramps-devel
>>
>

------------------------------------------------------------------------------
All the data continuously generated in your IT infrastructure contains a
definitive record of customers, application performance, security
threats, fraudulent activity and more. Splunk takes this data and makes
sense of it. Business sense. IT sense. Common sense.
http://p.sf.net/sfu/splunk-d2dcopy1
_______________________________________________
Gramps-devel mailing list
[hidden email]
https://lists.sourceforge.net/lists/listinfo/gramps-devel
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

Re: Database compare and merge

Benny Malengier
In reply to this post by derHeinzi


2011/9/26 Heinz Brinker <[hidden email]>
<snip>

I used my script for a compare between the Gramps example database of version 3.2 and 3.3 and found a lot of changed name tags in there. Also for an imported database some information might be stored with a different tag. Therefore (in the final version) the tool should have something like a tag dictionary in which you can say: Compare data in tag 'last' from my db to data in tag 'surname' in the compared db.

This is not maintainable. A better way in my opinion is:
1/set xml version needed to that of which you target
2/if newer xml version, give error that it is more recent and hence not supported (download updated script)
3/if older do
     A/ or tell user to import in empty family tree of most current Gramps, and export to a new current xml
     B/ or do above automatically using subprocess (http://docs.python.org/library/subprocess.html ) to start something like gramps -I input.xml -O ~/tmp/inputnew.xml
then work with inputnew.xml if subprocess returned successfully

This workflow will give you least worries (upgrade of Gramps of the old xml is the most trustworthy), and as Gramps is open source, is not a problem (everybody is allowed to use the most recent version)

Benny

------------------------------------------------------------------------------
All the data continuously generated in your IT infrastructure contains a
definitive record of customers, application performance, security
threats, fraudulent activity and more. Splunk takes this data and makes
sense of it. Business sense. IT sense. Common sense.
http://p.sf.net/sfu/splunk-d2d-oct
_______________________________________________
Gramps-devel mailing list
[hidden email]
https://lists.sourceforge.net/lists/listinfo/gramps-devel
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

Re: Database compare and merge

derHeinzi

Thank you, Benny.

You are right. If there is a possibility to convert the xml files to the same format prior to comparison this is the better way to handle renamed tags.

I'm still in the very beginning of db compare and merge. Its more a testing of ideas and proof of concepts than coding even a preliminary version. :-)

Currently I'm playing around with a possible GUI for this (using Tkinter).
I'll show you my ideas sometime next week to be discussed.

Is there BTW a very simple Gramps-GUI "framework" available for developers that can be used to create gtk dialogs to be included in Gramps? Its hard (for me as a beginner) to find out what has to be imported to design a dialog for Gramps.

Heinz

--- Benny Malengier <[hidden email]> schrieb am Do, 13.10.2011:

> Von: Benny Malengier <[hidden email]>
> Betreff: Re: [Gramps-devel] Database compare and merge
> An: "Heinz Brinker" <[hidden email]>
> CC: "Ken" <[hidden email]>, "Doug Blank" <[hidden email]>, [hidden email]
> Datum: Donnerstag, 13. Oktober, 2011 12:32 Uhr
>
>
> 2011/9/26 Heinz
> Brinker <[hidden email]>
>
> <snip>
>
>
>
> I used my script for a compare between the Gramps example
> database of version 3.2 and 3.3 and found a lot of changed
> name tags in there. Also for an imported database some
> information might be stored with a different tag. Therefore
> (in the final version) the tool should have something like a
> tag dictionary in which you can say: Compare data in tag
> 'last' from my db to data in tag 'surname'
> in the compared db.
>
>
> This is not maintainable. A better way in my opinion is:
> 1/set xml version needed to that of which you
> target
> 2/if newer xml version, give error that it is more recent
> and hence not supported (download updated script)
>
> 3/if older do
>      A/ or tell user to import in empty family tree of
> most current Gramps, and export to a new current xml
>      B/ or do above automatically using subprocess (http://docs.python.org/library/subprocess.html
> ) to start something like gramps -I input.xml -O
> ~/tmp/inputnew.xml
>
> then work with inputnew.xml if subprocess returned
> successfully
>
> This workflow will give you least worries (upgrade of
> Gramps of the old xml is the most trustworthy), and as
> Gramps is open source, is not a problem (everybody is
> allowed to use the most recent version)
>
>
> Benny
>
>

------------------------------------------------------------------------------
All the data continuously generated in your IT infrastructure contains a
definitive record of customers, application performance, security
threats, fraudulent activity and more. Splunk takes this data and makes
sense of it. Business sense. IT sense. Common sense.
http://p.sf.net/sfu/splunk-d2d-oct
_______________________________________________
Gramps-devel mailing list
[hidden email]
https://lists.sourceforge.net/lists/listinfo/gramps-devel
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

Re: Database compare and merge

Benny Malengier


2011/10/13 Heinz Brinker <[hidden email]>

Thank you, Benny.

You are right. If there is a possibility to convert the xml files to the same format prior to comparison this is the better way to handle renamed tags.

I'm still in the very beginning of db compare and merge. Its more a testing of ideas and proof of concepts than coding even a preliminary version. :-)

Currently I'm playing around with a possible GUI for this (using Tkinter).
I'll show you my ideas sometime next week to be discussed.

Is there BTW a very simple Gramps-GUI "framework" available for developers that can be used to create gtk dialogs to be included in Gramps? Its hard (for me as a beginner) to find out what has to be imported to design a dialog for Gramps.

PyGtk is simple in my opinion. For custom interfaces there is glade (see README in src/glade/catalog), combined with GtkBuilder, used for example for tools. See tools in the plugin directory that have a glade file (eg relcalc.glade, desbrowser.glade, ...). Just bite the bullet and do what you do immediately as a tool in Gramps, even if you will work on two xml files. Distribution to Gramps users is then automatic if you get somewhere.

For reports we use an abstracted API so that it works in CLI without a GUI, but what you do is best done as a tool, even if you write out some report.

Don't use Tkinter please. There is no need to waste time :-)

Benny

Heinz

--- Benny Malengier <[hidden email]> schrieb am Do, <a href="tel:13.10.2011" value="+13102011">13.10.2011:

> Von: Benny Malengier <[hidden email]>
> Betreff: Re: [Gramps-devel] Database compare and merge
> An: "Heinz Brinker" <[hidden email]>
> CC: "Ken" <[hidden email]>, "Doug Blank" <[hidden email]>, [hidden email]
> Datum: Donnerstag, 13. Oktober, 2011 12:32 Uhr
>
>
> 2011/9/26 Heinz
> Brinker <[hidden email]>
>
> <snip>
>
>
>
> I used my script for a compare between the Gramps example
> database of version 3.2 and 3.3 and found a lot of changed
> name tags in there. Also for an imported database some
> information might be stored with a different tag. Therefore
> (in the final version) the tool should have something like a
> tag dictionary in which you can say: Compare data in tag
> 'last' from my db to data in tag 'surname'
> in the compared db.
>
>
> This is not maintainable. A better way in my opinion is:
> 1/set xml version needed to that of which you
> target
> 2/if newer xml version, give error that it is more recent
> and hence not supported (download updated script)
>
> 3/if older do
>      A/ or tell user to import in empty family tree of
> most current Gramps, and export to a new current xml
>      B/ or do above automatically using subprocess (http://docs.python.org/library/subprocess.html
> ) to start something like gramps -I input.xml -O
> ~/tmp/inputnew.xml
>
> then work with inputnew.xml if subprocess returned
> successfully
>
> This workflow will give you least worries (upgrade of
> Gramps of the old xml is the most trustworthy), and as
> Gramps is open source, is not a problem (everybody is
> allowed to use the most recent version)
>
>
> Benny
>
>


------------------------------------------------------------------------------
All the data continuously generated in your IT infrastructure contains a
definitive record of customers, application performance, security
threats, fraudulent activity and more. Splunk takes this data and makes
sense of it. Business sense. IT sense. Common sense.
http://p.sf.net/sfu/splunk-d2d-oct
_______________________________________________
Gramps-devel mailing list
[hidden email]
https://lists.sourceforge.net/lists/listinfo/gramps-devel
Loading...