Gramps 5.0 Decisions

classic Classic list List threaded Threaded
55 messages Options
123
Reply | Threaded
Open this post in threaded view
|

Re: Gramps 5.0 Decisions

DS Blank
On Tue, Aug 25, 2015 at 6:39 PM, Nick Hall <[hidden email]> wrote:

[snip]
 
>
> load_tbl_txn() creates a transaction and commits it but has no exception handling. It’s also creating and committing a new txn for each put; it would be faster to create a single txn and run the loop with just the one.

This is new in gramps50 and I agree with you.  I haven't had a chance to
look at the new granps50 code yet.

There should be nothing new or different in gramps50 at the bsddb level.... this is only a refactor of existing code. load_tbl_txn was in gramps/gen/db/backup.py.

-Doug
 

------------------------------------------------------------------------------

_______________________________________________
Gramps-devel mailing list
[hidden email]
https://lists.sourceforge.net/lists/listinfo/gramps-devel
Reply | Threaded
Open this post in threaded view
|

Re: Gramps 5.0 Decisions

Nick Hall
On 26/08/15 11:51, Doug Blank wrote:
> There should be nothing new or different in gramps50 at the bsddb
> level.... this is only a refactor of existing code. load_tbl_txn was
> in gramps/gen/db/backup.py.
>

Thanks.  I was looking in write.py in v4.2.

Doug - Since you know this code better than I do, can you evaluate the
proposed changes?


Nick.

------------------------------------------------------------------------------
_______________________________________________
Gramps-devel mailing list
[hidden email]
https://lists.sourceforge.net/lists/listinfo/gramps-devel
Reply | Threaded
Open this post in threaded view
|

Re: Gramps 5.0 Decisions

John Ralls-2
In reply to this post by Nick Hall

> On Aug 25, 2015, at 11:39 PM, Nick Hall <[hidden email]> wrote:
>
> On 25/08/15 21:04, John Ralls wrote:
>> The cursor functions appear to me to not be wrapped in `with BSDDBTxn` blocks; what’s more, they’re passed self.txn as a transaction but it seems never to be initialized or committed. Finally, they have no exception handling except in their creating, and that’s only the @catch_db_error decorator which writes the log and re-raises.
>
> Gramps cursors are read-only.

No, the cursor class provides an “update” function which does a put protected only by DB_AUTO_COMMIT. If that’s not supposed to be used then it should be removed.

>
>
>>
>> load_tbl_txn() creates a transaction and commits it but has no exception handling. It’s also creating and committing a new txn for each put; it would be faster to create a single txn and run the loop with just the one.
>
> This is new in gramps50 and I agree with you.  I haven't had a chance to look at the new granps50 code yet.
>
>
>>
>> commit_base() calls data_map.put and update_reference_map passing self.txn as the DbTxn; as pointend out above that doesn’t seem to be initialized or committed anywhere, and even if it were there’s no exception handling.
>
> The commit functions will be called inside 'with DbTxn' blocks.   I don't see a problem here.

“with DbTxn" was a puzzle-piece I was missing, and I see that its __enter__() calls DbBsddb.begin_transaction() which does set self.txn unless batch=True. DbTxn(batch=True) is called in several importers and tools and at least some of those (presumably all of the importers) write to the database and those writes are therefore done without a transaction. The trade-off there is that it will use DB_AUTO_COMMIT to create a transaction per put, which is less efficient but which allows a put to fail and the rest of the import or tool to proceed. I trust that trade-off was made consciously.

The same applies if any of those functions are called outside of a `with DbTxn` block: The transaction protection comes from DB_AUTO_COMMIT; as long as the only possible exceptions come from libdb, I guess we’re OK.

>> ISTM DbBsddb.txn should get removed. One doesn’t want to connect and disconnect from the database every time one wants to save something as that would be very expensive, but having a DbTxn member variable implies that one is using RAII [1], which would require destroying the DbBsddb object every time one wanted to commit.
>
> I haven't got time to investigate now, but I doubt that we are connecting and disconnecting the database for every transaction.

We’re not. That was the point: Having the member variable implies using RAII, which I’m used to seeing in construction/destruction of the containing object. In this case it’s done in DbTxn.__enter__() and DbTxn.__exit__() which I didn’t see until you pointed it out. I think that it would be clearer if the txn belonged to DbTxn instead of DbBsddb and that DbBsddb should get txn via its member instance of the DbTxn.

Perhaps a summary of the database corruption bugs should be the next step in our analysis.

Regards,
John Ralls
------------------------------------------------------------------------------
_______________________________________________
Gramps-devel mailing list
[hidden email]
https://lists.sourceforge.net/lists/listinfo/gramps-devel
Reply | Threaded
Open this post in threaded view
|

Re: Gramps 5.0 Decisions

Nick Hall
On 26/08/15 22:30, John Ralls wrote:
>> Gramps cursors are read-only.
> No, the cursor class provides an “update” function which does a put protected only by DB_AUTO_COMMIT. If that’s not supposed to be used then it should be removed.
>

I'm not sure what was intended here.  In the database API, it is
suggested that cursors can be used for quick iteration over tables and
return raw serialised objects.

For updating records we iterate over handles, and a DbTxn is used.
Attempting to update an object returned by an object iterator results in
an error.

I agree that our use of cursors should be reviewed, and the update
function possibly removed.

Nick.


------------------------------------------------------------------------------
_______________________________________________
Gramps-devel mailing list
[hidden email]
https://lists.sourceforge.net/lists/listinfo/gramps-devel
Reply | Threaded
Open this post in threaded view
|

Re: Gramps 5.0 Decisions

Nick Hall
In reply to this post by John Ralls-2
On 26/08/15 22:30, John Ralls wrote:
> Perhaps a summary of the database corruption bugs should be the next step in our analysis.

Good idea.  We may have some bugs to fix in v4.2.  Shall we create a bug
report for this?

Nick.


------------------------------------------------------------------------------
_______________________________________________
Gramps-devel mailing list
[hidden email]
https://lists.sourceforge.net/lists/listinfo/gramps-devel
Reply | Threaded
Open this post in threaded view
|

Re: Gramps 5.0 Decisions

John Ralls-2

> On Aug 27, 2015, at 5:27 PM, Nick Hall <[hidden email]> wrote:
>
> On 26/08/15 22:30, John Ralls wrote:
>> Perhaps a summary of the database corruption bugs should be the next step in our analysis.
>
> Good idea.  We may have some bugs to fix in v4.2.  Shall we create a bug report for this?

And go through the tracker marking everything that looks like a corrupt database as related? That might be a useful exercise.

Regards,
John Ralls



------------------------------------------------------------------------------
_______________________________________________
Gramps-devel mailing list
[hidden email]
https://lists.sourceforge.net/lists/listinfo/gramps-devel
Reply | Threaded
Open this post in threaded view
|

Re: Gramps 5.0 Decisions

Sebastian Schubert
In reply to this post by DS Blank
Hi devs,

sorry for bumping in. I just want to describe my workflow, which fits to
one of your proposals.

On 14/08/15 13:59, Doug Blank wrote:
> 2. Auto Backups
>
> We could help issue #1 by making sure that there is always an archive
> export (independent of database backend, say XML) on each exit. However,
> for large number of records, this would take too much time.
>
> We might be able to make this more amenable by saving only changed items
> (basically a diff) in the archived format.

Even now, I do exactly that. After finishing some work, I export my
database to (plain text) XML. This file and all my media files are
managed by git. This has the following advantages:

* diffs: I can check whether something stupid happened. Sure, it's not
perfect but for example if I added only people I expect a lot of green
and very little red in the diffs...
* the ability to go back to older versions comforts me :)
* no python, database, ... upgrade issues
* plain text will be readable, hopefully, in 100 years so information
could be extracted
* sharing among computers is just a matter of git push/pull. Again no
python and database issues (should be same gramps version, though).

Maybe this sound also useful to others...

Cheers
Sebastian

------------------------------------------------------------------------------
_______________________________________________
Gramps-devel mailing list
[hidden email]
https://lists.sourceforge.net/lists/listinfo/gramps-devel
Reply | Threaded
Open this post in threaded view
|

Re: Gramps 5.0 Decisions

Oldest1
On 8/28/2015 12:50 AM, Sebastian Schubert wrote:

Your 'backup' scheme sounds very much like what I would
need/want.

The easier it is to make a backup, the more likely it will
actually end up
being done.:-)

> Even now, I do exactly that. After finishing some work, I export my
> database to (plain text) XML. This file and all my media files are
> managed by git. This has the following advantages:
>
> * diffs: I can check whether something stupid happened. Sure, it's not
> perfect but for example if I added only people I expect a lot of green
> and very little red in the diffs...
> * the ability to go back to older versions comforts me :)
> * no python, database, ... upgrade issues
> * plain text will be readable, hopefully, in 100 years so information
> could be extracted
> * sharing among computers is just a matter of git push/pull. Again no
> python and database issues (should be same gramps version, though).
>
> Maybe this sound also useful to others...
It certainly does help me

Arnold

--
Fight Spam - report it with wxSR 0.7
Vista & Win7 compatible
http://www.columbinehoney.net/wxSR.shtml


------------------------------------------------------------------------------
_______________________________________________
Gramps-devel mailing list
[hidden email]
https://lists.sourceforge.net/lists/listinfo/gramps-devel
Reply | Threaded
Open this post in threaded view
|

Re: Gramps 5.0 Decisions

Oldest1
On 8/28/2015 9:42 AM, jecxz112 wrote:
> On 8/28/2015 12:50 AM, Sebastian Schubert wrote:
>
> Your 'backup' scheme sounds very much like what I would
> need/want.
After more thought, and because I am so new to git, what is the
recommended or usual way to make a backup - possibly
incremental, other than just another clone?

Arnold

--
Fight Spam - report it with wxSR 0.7
Vista & Win7 compatible
http://www.columbinehoney.net/wxSR.shtml


------------------------------------------------------------------------------
_______________________________________________
Gramps-devel mailing list
[hidden email]
https://lists.sourceforge.net/lists/listinfo/gramps-devel
Reply | Threaded
Open this post in threaded view
|

Re: Gramps 5.0 Decisions

Nick Hall
In reply to this post by John Ralls-2
On 27/08/15 22:19, John Ralls wrote:
>> On Aug 27, 2015, at 5:27 PM, Nick Hall<[hidden email]>  wrote:
>> >
>> >On 26/08/15 22:30, John Ralls wrote:
>>> >>Perhaps a summary of the database corruption bugs should be the next step in our analysis.
>> >
>> >Good idea.  We may have some bugs to fix in v4.2.  Shall we create a bug report for this?
> And go through the tracker marking everything that looks like a corrupt database as related? That might be a useful exercise.

Yes.  I'll start with a search for "DB_RUNRECOVER".


Nick.

------------------------------------------------------------------------------
_______________________________________________
Gramps-devel mailing list
[hidden email]
https://lists.sourceforge.net/lists/listinfo/gramps-devel
Reply | Threaded
Open this post in threaded view
|

Re: Gramps 5.0 Decisions

Nick Hall
In reply to this post by Sebastian Schubert
On 28/08/15 08:50, Sebastian Schubert wrote:

> Even now, I do exactly that. After finishing some work, I export my
> database to (plain text) XML. This file and all my media files are
> managed by git. This has the following advantages:
>
> * diffs: I can check whether something stupid happened. Sure, it's not
> perfect but for example if I added only people I expect a lot of green
> and very little red in the diffs...
> * the ability to go back to older versions comforts me:)
> * no python, database, ... upgrade issues
> * plain text will be readable, hopefully, in 100 years so information
> could be extracted
> * sharing among computers is just a matter of git push/pull. Again no
> python and database issues (should be same gramps version, though).
>
> Maybe this sound also useful to others...

I do something very similar.

It is a good idea to make Gramps XML backups on a regular basis. Using
backups to transfer databases between machines and for upgrades is also
a good approach.

However, using git for version control will not be appealing to most users.


Nick.


------------------------------------------------------------------------------
_______________________________________________
Gramps-devel mailing list
[hidden email]
https://lists.sourceforge.net/lists/listinfo/gramps-devel
Reply | Threaded
Open this post in threaded view
|

Re: Gramps 5.0 Decisions

Sebastian Schubert
In reply to this post by Oldest1
Hi Arnold,

> After more thought, and because I am so new to git, what is the
> recommended or usual way to make a backup - possibly
> incremental, other than just another clone?

I don't know whether my approach is recommended but it certainly works
for me. However, the second script is a bit risky but I guess it could
be improved (see below).

Bash-Script to make a gramps xml backup of the database:

###### db2xml #############################################
FTREE="NAMEOFYOURTREE"
REPO="${HOME}/gramps/"
GRAMPSDB="${HOME}/.gramps/grampsdb/"

# make backup of xml
if [ -e "${REPO}/${FTREE}.gramps" ]; then
    [ -d "${REPO}/xmlbackup" ] || mkdir "${REPO}/xmlbackup"
    mv "${REPO}/${FTREE}.gramps" \
       "${REPO}/xmlbackup/`date +%FT%H%M%S`.gramps"
fi

# export family tree and extract it
TMPFILE=`mktemp`
gramps -y -e "${TMPFILE}" -f gramps -O "${FTREE}"
gunzip < "${TMPFILE}" > "${REPO}/${FTREE}.gramps"
rm "${TMPFILE}"
###########################################################

First, a backup of an exiting xml file is done. Then, gramps command
line is used to export the gramps xml (gzipped). This gzipped files is
then extracted. I manage the resulting file with git, so I can do the
usual git stuff. (For that, it useful when the file is just plain text
and not compressed. Git compresses its internal files anyway.

The script to import the xml file back into a database has an issue.
When you import a gramps xml file, the data is added to an existing
database with the same name. Thus, a binary database with the same name
must not exist. Now, I was too lazy to search for the correct database
and because I only have one database, I just move the complete gramps
database folder (with a backup, though). So if you have more than one
gramps database, you will have to enhance the script here.

###### xml2db #############################################
FTREE="NAMEOFYOURTREE"
REPO="${HOME}/gramps/"
GRAMPSDB="${HOME}/.gramps/grampsdb"

# move binary gramps database files
if [ -e "${GRAMPSDB}" ]; then
    [ -d "${REPO}/dbbackup" ] || mkdir "${REPO}/dbbackup"
    mv "${GRAMPSDB}" "${REPO}/dbbackup/"
fi

# open family tree for editing
gramps "${REPO}/${FTREE}.gramps" &

# compress backup
if [ -d "${REPO}/dbbackup/grampsdb" ]; then
    cd "${REPO}/dbbackup/"
    tar cjf "`date +%FT%H%M%S`.tar.bz2" "grampsdb"
    rm -r "grampsdb"
fi
###########################################################

Cheers
Sebastian

------------------------------------------------------------------------------
_______________________________________________
Gramps-devel mailing list
[hidden email]
https://lists.sourceforge.net/lists/listinfo/gramps-devel
Reply | Threaded
Open this post in threaded view
|

Re: Gramps 5.0 Decisions

John Ralls-2
In reply to this post by Nick Hall

> On Aug 29, 2015, at 4:29 AM, Nick Hall <[hidden email]> wrote:
>
> On 27/08/15 22:19, John Ralls wrote:
>>> On Aug 27, 2015, at 5:27 PM, Nick Hall<[hidden email]>  wrote:
>>> >
>>> >On 26/08/15 22:30, John Ralls wrote:
>>>> >>Perhaps a summary of the database corruption bugs should be the next step in our analysis.
>>> >
>>> >Good idea.  We may have some bugs to fix in v4.2.  Shall we create a bug report for this?
>> And go through the tracker marking everything that looks like a corrupt database as related? That might be a useful exercise.
>
> Yes.  I'll start with a search for "DB_RUNRECOVER".
>

Another set to consider results from searching “process-private”, which is part of the bad DbEnv error stream.

Regards,
John Ralls


------------------------------------------------------------------------------
_______________________________________________
Gramps-devel mailing list
[hidden email]
https://lists.sourceforge.net/lists/listinfo/gramps-devel
Reply | Threaded
Open this post in threaded view
|

Re: Gramps 5.0 Decisions

Oldest1
In reply to this post by Sebastian Schubert
On 8/29/2015 5:18 AM, Sebastian Schubert wrote:
> Hi Arnold,
>
>> After more thought, and because I am so new to git, what is the
>> recommended or usual way to make a backup - possibly
>> incremental, other than just another clone?
> I don't know whether my approach is recommended but it certainly works
> for me. However, the second script is a bit risky but I guess it could
> be improved (see below).
Thank you for providing your script, including the cautions.

Before I read your post, I had looked at ways to backup a
git repository using 'git archive',
because even if I follow your path, there still remains the
issue of backing up the git repo.

On the whole I am still considering a local Subversion
repository, since in this case, there
is no need to need for distribution and I very much want to
keep control of the data
locally.
Not that being more familiar with SVN that git - at least
right now - has any bearing :-)

OTOH, is gramps not able to produce and uncompressed .gramps
file to make it unnecessary
to unzip it for backup? (I still don't have many of these
details in my head, .... )

At what stage was RCS as part of an available option
discontinued?

Arnold

>
> Bash-Script to make a gramps xml backup of the database:
>
> ###### db2xml #############################################
> FTREE="NAMEOFYOURTREE"
> REPO="${HOME}/gramps/"
> GRAMPSDB="${HOME}/.gramps/grampsdb/"
>
> # make backup of xml
> if [ -e "${REPO}/${FTREE}.gramps" ]; then
>      [ -d "${REPO}/xmlbackup" ] || mkdir "${REPO}/xmlbackup"
>      mv "${REPO}/${FTREE}.gramps" \
>         "${REPO}/xmlbackup/`date +%FT%H%M%S`.gramps"
> fi
>
> # export family tree and extract it
> TMPFILE=`mktemp`
> gramps -y -e "${TMPFILE}" -f gramps -O "${FTREE}"
> gunzip < "${TMPFILE}" > "${REPO}/${FTREE}.gramps"
> rm "${TMPFILE}"
> ###########################################################
>
> First, a backup of an exiting xml file is done. Then, gramps command
> line is used to export the gramps xml (gzipped). This gzipped files is
> then extracted. I manage the resulting file with git, so I can do the
> usual git stuff. (For that, it useful when the file is just plain text
> and not compressed. Git compresses its internal files anyway.
>
> The script to import the xml file back into a database has an issue.
> When you import a gramps xml file, the data is added to an existing
> database with the same name. Thus, a binary database with the same name
> must not exist. Now, I was too lazy to search for the correct database
> and because I only have one database, I just move the complete gramps
> database folder (with a backup, though). So if you have more than one
> gramps database, you will have to enhance the script here.
>
> ###### xml2db #############################################
> FTREE="NAMEOFYOURTREE"
> REPO="${HOME}/gramps/"
> GRAMPSDB="${HOME}/.gramps/grampsdb"
>
> # move binary gramps database files
> if [ -e "${GRAMPSDB}" ]; then
>      [ -d "${REPO}/dbbackup" ] || mkdir "${REPO}/dbbackup"
>      mv "${GRAMPSDB}" "${REPO}/dbbackup/"
> fi
>
> # open family tree for editing
> gramps "${REPO}/${FTREE}.gramps" &
>
> # compress backup
> if [ -d "${REPO}/dbbackup/grampsdb" ]; then
>      cd "${REPO}/dbbackup/"
>      tar cjf "`date +%FT%H%M%S`.tar.bz2" "grampsdb"
>      rm -r "grampsdb"
> fi
> ###########################################################
>
> Cheers
> Sebastian
>
> ------------------------------------------------------------------------------
> _______________________________________________
> Gramps-devel mailing list
> [hidden email]
> https://lists.sourceforge.net/lists/listinfo/gramps-devel
>


--
Fight Spam - report it with wxSR 0.7
Vista & Win7 compatible
http://www.columbinehoney.net/wxSR.shtml


------------------------------------------------------------------------------
_______________________________________________
Gramps-devel mailing list
[hidden email]
https://lists.sourceforge.net/lists/listinfo/gramps-devel
Reply | Threaded
Open this post in threaded view
|

Re: Gramps 5.0 Decisions

Benny Malengier
In reply to this post by John Ralls-2
2015-08-26 23:30 GMT+02:00 John Ralls <[hidden email]>:

>> The commit functions will be called inside 'with DbTxn' blocks.   I don't see a problem here.
>
> “with DbTxn" was a puzzle-piece I was missing, and I see that its __enter__() calls DbBsddb.begin_transaction() which does set self.txn unless batch=True. DbTxn(batch=True) is called in several importers and tools and at least some of those (presumably all of the importers) write to the database and those writes are therefore done without a transaction. The trade-off there is that it will use DB_AUTO_COMMIT to create a transaction per put, which is less efficient but which allows a put to fail and the rest of the import or tool to proceed. I trust that trade-off was made consciously.

Yes, I remember this was a trade off at the time, discussed by Don. Or
things were not working in a single transaction (ideally you would
look the entire db instead of row per row for batch use) and it was
not possible to split things up as an importer only finishes at the
end, or it was too slow.

To come back to bsddb, it was chosen originally because it was present
in python 2.x, so easy install. And tests showed it much faster that
an SQL db. If I'm right, current SQL would store the same pickled data
as bsddb, which would not suffer from a speed degradation. bsddb no
longer present in python 3.x invalidates the 'easy install'.

For BSDDBTxn, is there a bug ticket to add after
https://github.com/gramps-project/gramps/blob/master/gramps/plugins/database/bsddb_support/bsddbtxn.py#L79
       self.abort()
and then test with some forced errors in write.py in commits?
It seems to me main test would be that decent error message goes to
the user, eg should writing version fail in
https://github.com/gramps-project/gramps/blob/master/gramps/plugins/database/bsddb_support/write.py#L2385
and __exit__ returns False, some sensible error should make it towards
the user to put in his bug report.
Otherwise, I can think of no reason not to have self.abort() present.

Benny

------------------------------------------------------------------------------
_______________________________________________
Gramps-devel mailing list
[hidden email]
https://lists.sourceforge.net/lists/listinfo/gramps-devel
123