Gramps 5.0 Decisions

classic Classic list List threaded Threaded
55 messages Options
123
Reply | Threaded
Open this post in threaded view
|

Gramps 5.0 Decisions

DS Blank
Devs,

Now that Gramps 4.2 is out, there are a few items that need to be discussed and decided for Gramps 5.0. Each of these deserves its own thread, but they are also related, so I'll at least start out here.

There are 4 main items: Updates, Auto Backups, Table Views, and Gramplets' Speed. First, the issues:

1. Updates

Currently, when there is a change in any database format, we need to add a section of code in upgrade.py implementing the changes from verion N -1 to N. An old BSDDB database might go through many of these steps to bring it up to date. Changes include: new tables, data format changes, and pickle changes.

On the other hand, the XML import has been designed to be able to import any older format. 

Unfortunately, when BSDDB updates, or Gramps updates, it is too late to make an XML backup with the previous version. So, we have had to update the BSDDB data.

This is going to be impossible to maintain with many different database backends. For some systems, it might not even be possible to migrate an old format into a new format in place.

2. Auto Backups

We could help issue #1 by making sure that there is always an archive export (independent of database backend, say XML) on each exit. However, for large number of records, this would take too much time. 

We might be able to make this more amenable by saving only changed items (basically a diff) in the archived format. 

3. Table Views

Currently, some table views have a lot of callbacks, made visible through slow scrolling. (Nick has a prototype to fix this). This is made even worse with high-latency backends if there is any database querying while scrolling or mouse movement.

4. Gramplets' Speed

There are some Gramplets that require a linear traversal through a table. This brings down the speed of the entire system.

Suggestions:

1. Updates

We no longer attempt to upgrade backend databases. We focus on being able to import all older archived data, and making sure there is archived data.

2. Auto backups

On close, all database backends should write differential changed data into an archive format. JSON format (dictionaries and lists) might be ideal for this. This could also help in being able to restore back to any point in the past (ie, checkpoints).

3. Table views

All table/tree views should load the data up-front, and have no database access afterwards, with no callbacks necessary (like Nick's prototype).

Additionally, if the number of records in a view exceeds a maximum value, we should display the data as a "paged view", showing only N records at a time, with "next page", "previous page" buttons.

4. Gramplets' Speed

We should go through all gramplets and change them to all use indexes, or summary data in the database. This may include removing some functionality or adding additional indexes/structures in the database backend API.

If a gramplet needs to do a linear scan, then that functionality should be moved to a Tool.

----
There may well be unintended consequences of some of these suggestions. Anything else that we should begin to explore?

-Doug

------------------------------------------------------------------------------

_______________________________________________
Gramps-devel mailing list
[hidden email]
https://lists.sourceforge.net/lists/listinfo/gramps-devel
Reply | Threaded
Open this post in threaded view
|

Re: Gramps 5.0 Decisions

Oldest1
On 8/14/2015 4:59 AM, Doug Blank wrote:
--------8X---------------------
> Suggestions:
>
> 1. Updates
>
> We no longer attempt to upgrade backend databases. We
> focus on being able to import all older archived data, and
> making sure there is archived data.
 From my perspective this is the most important part of all.
Any application which cannot import its own data from
earlier versions - no matter how old - is something I would
avoid at all cost.
Especially for genealogical work it would put a lot of work
in jeopardy.

Which is also my reason for looking for support of as much
of the GEDCOM varieties from other apps as possible.

>
> 2. Auto backups
>
> On close, all database backends should write differential
> changed data into an archive format. JSON format
> (dictionaries and lists) might be ideal for this. This
> could also help in being able to restore back to any point
> in the past (ie, checkpoints).
Also a must, for the same reasons. Crashes and mishaps are
part of life - recovery of as much data, with  reasonable
effort for the average user, is crucial

Seems to me something along those lines was once intended to
be covered by RCS - has this evolved or disappeared?

Arnold

--
Fight Spam - report it with wxSR 0.7
Vista & Win7 compatible
http://www.columbinehoney.net/wxSR.shtml


------------------------------------------------------------------------------
_______________________________________________
Gramps-devel mailing list
[hidden email]
https://lists.sourceforge.net/lists/listinfo/gramps-devel
Reply | Threaded
Open this post in threaded view
|

Re: Gramps 5.0 Decisions

Josip
In reply to this post by DS Blank
14. 08. 2015. u 13:59, Doug Blank je napisao/la:
> 3. Table views
>
> All table/tree views should load the data up-front, and have no database
> access afterwards, with no callbacks necessary (like Nick's prototype).
>

Sorry did't follow that! What prototype, where and how to test it?

--
Josip

------------------------------------------------------------------------------
_______________________________________________
Gramps-devel mailing list
[hidden email]
https://lists.sourceforge.net/lists/listinfo/gramps-devel
Reply | Threaded
Open this post in threaded view
|

Re: Gramps 5.0 Decisions

enno
Josip,
> 14. 08. 2015. u 13:59, Doug Blank je napisao/la:
>> 3. Table views
>>
>> All table/tree views should load the data up-front, and have no database
>> access afterwards, with no callbacks necessary (like Nick's prototype).
>>
> Sorry did't follow that! What prototype, where and how to test it?
>
https://github.com/Nick-Hall/gramps/tree/view-models

Try the person and citation trees.

Enno


------------------------------------------------------------------------------
_______________________________________________
Gramps-devel mailing list
[hidden email]
https://lists.sourceforge.net/lists/listinfo/gramps-devel
Reply | Threaded
Open this post in threaded view
|

Re: Gramps 5.0 Decisions

Josip
14. 08. 2015. u 21:03, Enno Borgsteede je napisao/la:

> Josip,
>> 14. 08. 2015. u 13:59, Doug Blank je napisao/la:
>>> 3. Table views
>>>
>>> All table/tree views should load the data up-front, and have no database
>>> access afterwards, with no callbacks necessary (like Nick's prototype).
>>>
>> Sorry did't follow that! What prototype, where and how to test it?
>>
> https://github.com/Nick-Hall/gramps/tree/view-models
>
> Try the person and citation trees.
>

Thanks Enno!

I like two trees idea but find implementation very disappointing.
Tested with our own example.gramps and citation tree.

Left tree has very high lag scrolling thru when right tree is big,
sometimes even look stuck when switching from one source with lots of
citations to another even when that other have just few.
Why is that, why selection isn't instant.
Populating right view is also slow, biggest source "Import from
test2.ged" have only 2800 citations and their loading is noticeable slow
(what will be with bigger trees).
Trees looked loaded all in one pass not dynamically and that increases
slowness filling not to mention that is blocking gui.


--
Josip

------------------------------------------------------------------------------
_______________________________________________
Gramps-devel mailing list
[hidden email]
https://lists.sourceforge.net/lists/listinfo/gramps-devel
Reply | Threaded
Open this post in threaded view
|

Re: Gramps 5.0 Decisions

Nick Hall
In reply to this post by DS Blank
Doug,

Are we actually going to support more than one database?  I thought that
we agreed to have only BSDDB in core Gramps and the other backends as
addons.

Although it would be tempting to get rid of database upgrades, they are
convenient for our users.  The upgrade is working at the moment so we
could just leave it in.  Are any schema changes planned for v5.0?

New database backends will have nothing to upgrade from.  I don't
suggest that we attempt to transfer data between backends without going
through Gramps XML.

Automatic backups are a good idea.  I actually wrote a gramplet for a
user who wanted automatic backups at fixed intervals.

We have covered views in a separate thread.

Some gramplets have always been slow.  This is not normally a problem
because they are only updated when active.  Which gramplets do full
table scans?  Do we really want to convert gramplets like the name
clouds into tools?  Tools are are primarily used to perform actions
rather than display information.


Nick.


On 14/08/15 12:59, Doug Blank wrote:

> Now that Gramps 4.2 is out, there are a few items that need to be
> discussed and decided for Gramps 5.0. Each of these deserves its own
> thread, but they are also related, so I'll at least start out here.
>
> There are 4 main items: Updates, Auto Backups, Table Views, and
> Gramplets' Speed. First, the issues:
>
> 1. Updates
>
> Currently, when there is a change in any database format, we need to
> add a section of code in upgrade.py implementing the changes from
> verion N -1 to N. An old BSDDB database might go through many of these
> steps to bring it up to date. Changes include: new tables, data format
> changes, and pickle changes.
>
> On the other hand, the XML import has been designed to be able to
> import any older format.
>
> Unfortunately, when BSDDB updates, or Gramps updates, it is too late
> to make an XML backup with the previous version. So, we have had to
> update the BSDDB data.
>
> This is going to be impossible to maintain with many different
> database backends. For some systems, it might not even be possible to
> migrate an old format into a new format in place.
>
> 2. Auto Backups
>
> We could help issue #1 by making sure that there is always an archive
> export (independent of database backend, say XML) on each exit.
> However, for large number of records, this would take too much time.
>
> We might be able to make this more amenable by saving only changed
> items (basically a diff) in the archived format.
>
> 3. Table Views
>
> Currently, some table views have a lot of callbacks, made visible
> through slow scrolling. (Nick has a prototype to fix this). This is
> made even worse with high-latency backends if there is any database
> querying while scrolling or mouse movement.
>
> 4. Gramplets' Speed
>
> There are some Gramplets that require a linear traversal through a
> table. This brings down the speed of the entire system.
>


------------------------------------------------------------------------------
_______________________________________________
Gramps-devel mailing list
[hidden email]
https://lists.sourceforge.net/lists/listinfo/gramps-devel
Reply | Threaded
Open this post in threaded view
|

Re: Gramps 5.0 Decisions

DS Blank
On Sat, Aug 15, 2015 at 1:08 PM, Nick Hall <[hidden email]> wrote:
Doug,

Are we actually going to support more than one database?  I thought that
we agreed to have only BSDDB in core Gramps and the other backends as
addons.

Right, there will be one official database backend. (Which one that is can change over time, but yes, we discussed leaving it as BSDDB for 5.0. Of course that was before the other backends were completed and finding some of the huge speed differences. I don't personally plan on enhancing BSDDB, but of course I would fix bugs as they are found.) I think we should keep DictionaryDB in core, if nothing more than for unit testing. (It is also very useful for huge trees; as I write this I am importing a 42M GEDCOM file, which I will then export in XML or JSON for use in a persistent tree--- 159k people, just finished). DBAPI and DjangoDB are in Addons, and are planned to stay there until we ever want to switch to one as the official default.

But, I plan on keeping these 4 backends (BSDDB, DBAPI, DictDb, and Django) working and operating as fast as possible.
 

Although it would be tempting to get rid of database upgrades, they are
convenient for our users.  The upgrade is working at the moment so we
could just leave it in.  Are any schema changes planned for v5.0?

No schema changes are planned. In fact, I hope that we can put off any proposals for schema changes until after 5.0, as all of the backends are now complete (as of this week). And we have other things to address...

The BSDDB upgrade process is only convenient if Gramps can continue to read the low-level file format, and only for BSDDB. Many times in the past, that was made impossible due to changes in BSDDB software. I'd rather work on an upgrade process that was useful for all backends. 

(Actually, for just pickled-data upgrades, the current upgrade system would mostly work for these as well. In fact, all of the other persistent database backends use pickled blobs. So, we could consider keeping with that for general use for the time being. Sqlite files won't be changing for the foreseeable future, I would guess.)
 

New database backends will have nothing to upgrade from.  I don't
suggest that we attempt to transfer data between backends without going
through Gramps XML.

Right, but I am planning ahead to handle when the schema changes. I described a method to upgrade based on *always* having archive files. Having checkpoints would be useful in general.

Automatic backups are a good idea.  I actually wrote a gramplet for a
user who wanted automatic backups at fixed intervals.

That would be useful to see.
 

We have covered views in a separate thread.

Some gramplets have always been slow.  This is not normally a problem
because they are only updated when active.  Which gramplets do full
table scans?

One of default ones: Statistics. That can be easily give more useful stats... do you really need a full person table scan every time you open the tree to see that there are 16 people with Unknown genders? And yet that is the default.
 
  Do we really want to convert gramplets like the name
clouds into tools?  Tools are are primarily used to perform actions
rather than display information.

My point is that you can do the Surname Cloud Gramplet and even a Given name Cloud without doing a full scan. (Surnames are in the db.get_surname_list() and given names can be read from db.genderStats. If we can't get fast operation without adding indexes, or other summary structures, we could move these to tools. (Actually, that might be too strong. Maybe just marking those visually (say having a red title/background  or something) to indicate that these are linear.)

The big issue is to identify those parts that are slow for big tables, and either fix them, or mark them as such.

-Doug
 


Nick.


On 14/08/15 12:59, Doug Blank wrote:
> Now that Gramps 4.2 is out, there are a few items that need to be
> discussed and decided for Gramps 5.0. Each of these deserves its own
> thread, but they are also related, so I'll at least start out here.
>
> There are 4 main items: Updates, Auto Backups, Table Views, and
> Gramplets' Speed. First, the issues:
>
> 1. Updates
>
> Currently, when there is a change in any database format, we need to
> add a section of code in upgrade.py implementing the changes from
> verion N -1 to N. An old BSDDB database might go through many of these
> steps to bring it up to date. Changes include: new tables, data format
> changes, and pickle changes.
>
> On the other hand, the XML import has been designed to be able to
> import any older format.
>
> Unfortunately, when BSDDB updates, or Gramps updates, it is too late
> to make an XML backup with the previous version. So, we have had to
> update the BSDDB data.
>
> This is going to be impossible to maintain with many different
> database backends. For some systems, it might not even be possible to
> migrate an old format into a new format in place.
>
> 2. Auto Backups
>
> We could help issue #1 by making sure that there is always an archive
> export (independent of database backend, say XML) on each exit.
> However, for large number of records, this would take too much time.
>
> We might be able to make this more amenable by saving only changed
> items (basically a diff) in the archived format.
>
> 3. Table Views
>
> Currently, some table views have a lot of callbacks, made visible
> through slow scrolling. (Nick has a prototype to fix this). This is
> made even worse with high-latency backends if there is any database
> querying while scrolling or mouse movement.
>
> 4. Gramplets' Speed
>
> There are some Gramplets that require a linear traversal through a
> table. This brings down the speed of the entire system.
>


------------------------------------------------------------------------------
_______________________________________________
Gramps-devel mailing list
[hidden email]
https://lists.sourceforge.net/lists/listinfo/gramps-devel


------------------------------------------------------------------------------

_______________________________________________
Gramps-devel mailing list
[hidden email]
https://lists.sourceforge.net/lists/listinfo/gramps-devel
Reply | Threaded
Open this post in threaded view
|

Re: Gramps 5.0 Decisions

Nick Hall
On 15/08/15 19:13, Doug Blank wrote:
  Do we really want to convert gramplets like the name
clouds into tools?  Tools are are primarily used to perform actions
rather than display information.

My point is that you can do the Surname Cloud Gramplet and even a Given name Cloud without doing a full scan. (Surnames are in the db.get_surname_list() and given names can be read from db.genderStats. If we can't get fast operation without adding indexes, or other summary structures, we could move these to tools. (Actually, that might be too strong. Maybe just marking those visually (say having a red title/background  or something) to indicate that these are linear.)

The big issue is to identify those parts that are slow for big tables, and either fix them, or mark them as such.


Perhaps we could limit the use of slow gramplets to the dashboard?  They would only update when the dashboard is displayed.

I have no objections to adding extra indexes.


Nick.


------------------------------------------------------------------------------

_______________________________________________
Gramps-devel mailing list
[hidden email]
https://lists.sourceforge.net/lists/listinfo/gramps-devel
Reply | Threaded
Open this post in threaded view
|

Re: Gramps 5.0 Decisions

Tim Lyons
Administrator
In reply to this post by Oldest1
Oldest1 wrote
 From my perspective this is the most important part of all.
Any application which cannot import its own data from
earlier versions - no matter how old - is something I would
avoid at all cost.
Especially for genealogical work it would put a lot of work
in jeopardy.
I think this is a very reasonable point of view.

Unfortunately, Gramps has long had difficulties with its database. If we look in the bug tracker or the mailing list, we see many problems where upgrade has caused people to find that their database is corrupted, or even just that normal use has resulted in database corruption. Unfortunately, in most cases, the problem has not been reproducible.

I agree that some corruptions are due to bugs that we have fixed (e.g. in the upgrade code using object definitions that are not the correct version for what the code is trying to do, or just simple oversights in the upgrade code).

We have tried to address this issue by giving clearer warnings about upgrade, and by generating a database dump before upgrade in the hope that this can be used if the upgrade fails (sadly, I don't recall a time this rescued a user In Real Life).

Many users report that they no longer have access to the old Gramps version; on Mac OS, it is easy to keep several different versions, but many other people seem to lose their old Gramps, e.g. when they upgrade their OS.

Maybe making an XML dump every time a database is closed would be a way forward, although the time delay could be a problem. I don't see how making a delta dump would work - how would the changes be discovered, would this involve much complicated (and hence possibly error prone) code?

None of these approaches seem to help those users who just keep their database open for long periods of time (this seems to be quite a common standard operating procedure), before the database suddenly becomes corrupt.

I don't know what the solution is; perhaps the real problem is that we don't have enough expertise in the lower levels of bsddb; would we have better expertise in any of the other DBMSs?

Regards,
Tim.
Reply | Threaded
Open this post in threaded view
|

Re: Gramps 5.0 Decisions

DS Blank
On Sun, Aug 16, 2015 at 6:30 PM, Tim Lyons <[hidden email]> wrote:
Oldest1 wrote
>  From my perspective this is the most important part of all.
> Any application which cannot import its own data from
> earlier versions - no matter how old - is something I would
> avoid at all cost.
> Especially for genealogical work it would put a lot of work
> in jeopardy.

I think this is a very reasonable point of view.

Unfortunately, Gramps has long had difficulties with its database. If we
look in the bug tracker or the mailing list, we see many problems where
upgrade has caused people to find that their database is corrupted, or even
just that normal use has resulted in database corruption. Unfortunately, in
most cases, the problem has not been reproducible.

I agree that some corruptions are due to bugs that we have fixed (e.g. in
the upgrade code using object definitions that are not the correct version
for what the code is trying to do, or just simple oversights in the upgrade
code).

We have tried to address this issue by giving clearer warnings about
upgrade, and by generating a database dump before upgrade in the hope that
this can be used if the upgrade fails (sadly, I don't recall a time this
rescued a user In Real Life).

Many users report that they no longer have access to the old Gramps version;
on Mac OS, it is easy to keep several different versions, but many other
people seem to lose their old Gramps, e.g. when they upgrade their OS.

Maybe making an XML dump every time a database is closed would be a way
forward, although the time delay could be a problem. I don't see how making
a delta dump would work - how would the changes be discovered, would this
involve much complicated (and hence possibly error prone) code?

None of these approaches seem to help those users who just keep their
database open for long periods of time (this seems to be quite a common
standard operating procedure), before the database suddenly becomes corrupt.

I don't know what the solution is; perhaps the real problem is that we don't
have enough expertise in the lower levels of bsddb; would we have better
expertise in any of the other DBMSs?

I think you summarize the issue well.

I think that many of the problems of the past will go away with DBAPI using sqlite3. The file format (probably) won't change, and has a good support of tools. But I want a great system that will work regardless of what database backend a user uses.

Here is what I am imaging: we could do a serialized dump (XML or JSON) of an object after it changes to a timestamped text file. This file would include all of the items that changed in a transaction. So, every file would have all of the changed objects from the transaction. Deleted objects would be marked as well.

So, you have a full backup occasionally, followed by a series of transaction files. One can recover any state by starting with a backup, and "replaying" the transactions from then until a point beyond that. I think it is simple, and would be hard to mess up... you might lose a transaction (file corruption) but that is fairly robust.

Still imagining how this could work....

-Doug
 

Regards,
Tim.



--
View this message in context: http://gramps.1791082.n4.nabble.com/Gramps-5-0-Decisions-tp4672349p4672429.html
Sent from the GRAMPS - Dev mailing list archive at Nabble.com.

------------------------------------------------------------------------------
_______________________________________________
Gramps-devel mailing list
[hidden email]
https://lists.sourceforge.net/lists/listinfo/gramps-devel


------------------------------------------------------------------------------

_______________________________________________
Gramps-devel mailing list
[hidden email]
https://lists.sourceforge.net/lists/listinfo/gramps-devel
Reply | Threaded
Open this post in threaded view
|

Re: Gramps 5.0 Decisions

Oldest1
In reply to this post by Tim Lyons
On 8/16/2015 3:30 PM, Tim Lyons wrote:

> Oldest1 wrote
>>   From my perspective this is the most important part of all.
>> Any application which cannot import its own data from
>> earlier versions - no matter how old - is something I would
>> avoid at all cost.
>> Especially for genealogical work it would put a lot of work
>> in jeopardy.
> I think this is a very reasonable point of view.
>
> Unfortunately, Gramps has long had difficulties with its database.
All the more reason, IMO, to use a better understood and
better behaved DB back-end
as well as more rigorous testing - though I appreciate that
that part of the work is not
very appealing.
And all the rest of the arguments regarding this db issue,
only emphasize the problem :-(

Unfortunately, at this stage I am new to so much of gramps,
that all it seems I can do is
to find issues I feel important and which I feel need attention.
> Many users report that they no longer have access to the old Gramps version;
> on Mac OS, it is easy to keep several different versions, but many other
> people seem to lose their old Gramps, e.g. when they upgrade their OS.
Although it seems to me that that issue can be addressed,
certainly under Windows, which
I am most familiar with.
In fact I don't see where there is an issue in this regard
under Windows - in the short time
that I have used gramps, I have not have had any issue with
this.

Aside from my initial uncertainty about how things worked
with gramps, I now have
several different installations - with separate data
directories as set in Preferences
-> Family Tree.
Some versions I start using a batch file and set GRAMPSHOME
to s separate directory,
though even in this case, there remains the option to
override this directory via
Preferences.

But, IMO, that is not really the issue.

No other serious application I am aware of makes access to
old versions and having
them installed and working a requirement simply, in effect,
to be able to import
old data, and those that have done so have rightly been left
in the dust - I hope.

The one app I can think of with the problem of
providing/enabling updates to
the db only if the existing, to-be-updated installation is
installed during the
update shall remain nameless, but this issue is one of the
reasons I would
like to switch to gramps.

This still leaves the media, and their links to the current
db, as another not so
little piece of this issue.
They do need to be a part of any usable backup.
Especially for the less tech-savvy users, who don't have the
background to
come up with workarounds.

So users are impatient and want instantaneous backups ?

How much more time will users have to invest again if even
just the links are
broken, not to mention a total or even partial loss of the
necessary media,
compared to the time it takes to make a useable backup?

Arnold

--
Fight Spam - report it with wxSR 0.7
Vista & Win7 compatible
http://www.columbinehoney.net/wxSR.shtml


------------------------------------------------------------------------------
_______________________________________________
Gramps-devel mailing list
[hidden email]
https://lists.sourceforge.net/lists/listinfo/gramps-devel
Reply | Threaded
Open this post in threaded view
|

Re: Gramps 5.0 Decisions

Tim Lyons
Administrator
In reply to this post by DS Blank
DS Blank wrote
I think you summarize the issue well.
Thanks you.

DS Blank wrote
I think that many of the problems of the past will go away with DBAPI using
sqlite3. The file format (probably) won't change, and has a good support of
tools. But I want a great system that will work regardless of what database
backend a user uses.
Why do you think sqlite will make the problems go away?

I expect that sqlite has a selection of things like checkpoints, before-looks, after-looks, roll-back, roll-forward etc just like BSDDB. I expect that sqlite is quite a complicated piece of software, just like BSDDB. I expect sqlite has a whole range of configuration options just like BSDDB. I expect sqlite has few Gramps developers who have a low level understanding of the database, just like BSDDB. I expect sqlite has bugs just like BSDDB. BSDDB is supported by Oracle, so I would expect it to be fairly robust.

OK, maybe you will say that sqlite is simpler than BSDDB, but I doubt that it is the orders of magnitude simpler that will avoid these problem. I understand that BSDDB is designed for concurrent access, whereas sqlite is less so. However, we seem to get corruption problems even when there is only one user accessing the database (opening the same family twice is sometimes the cause of problems, but not often).

How do sqlite and BSDDB compare in reliability if an application crashes in the middle of writing a transaction? My guess is that some of the database corruptions we see are due to errors in finding auxiliary files (hence the messages about 'environment not found') is sqlite likely to be any different in possible problems here?

I am not against sqlite in general, just why should I believe it is more reliable?

DS Blank wrote
Here is what I am imaging: we could do a serialized dump (XML or JSON) of
an object after it changes to a timestamped text file. This file would
include all of the items that changed in a transaction. So, every file
would have all of the changed objects from the transaction. Deleted objects
would be marked as well.

So, you have a full backup occasionally, followed by a series of
transaction files. One can recover any state by starting with a backup, and
"replaying" the transactions from then until a point beyond that. I think
it is simple, and would be hard to mess up... you might lose a transaction
(file corruption) but that is fairly robust.
Isn't that what a DBMS does? And a DBMS is quite a complicated thing to implement, so why do you think we could implement a reliable transaction system to produce XML output? I could see all sorts of difficulties in reliably replicating all updates into XML, and in replaying them. Just as a set of random problems, what about timestamp accuracy and consistency, synchronising the full backup with the incremental ones, Gramps database undo or roll-back, ensuring we get it right every time, multiple threads etc. Doesn't sound like something that would be easy to get reliable enough.

Regards,
Tim.
Reply | Threaded
Open this post in threaded view
|

Re: Gramps 5.0 Decisions

Tom Hughes
On 18/08/15 17:33, Tim Lyons wrote:

> OK, maybe you will say that sqlite is simpler than BSDDB, but I doubt that
> it is the orders of magnitude simpler that will avoid these problem. I
> understand that BSDDB is designed for concurrent access, whereas sqlite is
> less so. However, we seem to get corruption problems even when there is only
> one user accessing the database (opening the same family twice is sometimes
> the cause of problems, but not often).

In principle BSDDB may support concurrent access but in practice gramps
has for some time being using it in a way that blocks any other access
while gramps has the database open.

Tom

--
Tom Hughes ([hidden email])
http://compton.nu/

------------------------------------------------------------------------------
_______________________________________________
Gramps-devel mailing list
[hidden email]
https://lists.sourceforge.net/lists/listinfo/gramps-devel
--
Tom Hughes (tom@compton.nu)
http://compton.nu/
Reply | Threaded
Open this post in threaded view
|

Re: Gramps 5.0 Decisions

enno
In reply to this post by Tim Lyons
Tim,
> How do sqlite and BSDDB compare in reliability if an application
> crashes in the middle of writing a transaction?My guess is that some
> of the database corruptions we see are due to errors in finding
> auxiliary files (hence the messages about 'environment not found') is
> sqlite likely to be any different in possible problems here? I am not
> against sqlite in general, just why should I believe it is more reliable?
Well, first, I believe there are no auxiliary files, or not as many as
with BSDDB. I may have seen a .journal file when working with
RootsMagic, but it basically uses a single file, with a single format,
regardless of OS, CPU architecture, whatever.

Second, I use SQLite everyday, because it is embedded in my phone
(Android), tablet (Android), and my mail program (Thunderbird), so I use
it while typing this. It's also used in Firefox, and I haven't seen an
SQLite related anomaly in any of these applications ever. And you're
using it too, because according to this page

https://www.sqlite.org/famous.html

it's in all sorts of Apple products too.

In other words, SQLite has an installed base that's way larger than the
average for BSDDB, and you probably use it more often than Gramps
itself, so if it were causing problems, you'd probably know.

This is not to say that all problems we have today are caused by BSDDB,
nor that they go away when we switch to SQLite. If me mess up by writing
bad (binary) data, use Pickle in the wrong way, whatever, we will have
problems still. But when such problems arise, we, and maybe more
important, users, have more tools available. There's lots of tools out
there, many most probably more user friendly than the ones for BSDDB,
where you need to find the right version on the Oracle site, which is a
challenge by itself.

regards,

Enno


------------------------------------------------------------------------------
_______________________________________________
Gramps-devel mailing list
[hidden email]
https://lists.sourceforge.net/lists/listinfo/gramps-devel
Reply | Threaded
Open this post in threaded view
|

Re: Gramps 5.0 Decisions

DS Blank
In reply to this post by Tim Lyons
On Tue, Aug 18, 2015 at 12:33 PM, Tim Lyons <[hidden email]> wrote:
DS Blank wrote
> I think you summarize the issue well.

Thanks you.


DS Blank wrote
> I think that many of the problems of the past will go away with DBAPI
> using
> sqlite3. The file format (probably) won't change, and has a good support
> of
> tools. But I want a great system that will work regardless of what
> database
> backend a user uses.

Why do you think sqlite will make the problems go away?

First I should say that I am in no way trying to convince anyone to switch away from BSDDB if they like it, or prefer it. I am only working on making Gramps more flexible by allowing different backends.

Personally, I do feel that sqlite (rather than BSDDB, mysql, or postgresql) will be a better choice as a Gramps backend, when all is sufficiently tested. Why? I didn't know that it was used in as many projects as Enno pointed out, but my reasons are from personal experience. I've used sqlite for just about everything over the last 10 years, including a live website that has been up continuously for almost 10 years (http://myro.roboteducation.org/robobiblio/). 

* It is fast
* Robust
* Designed for exactly this purpose: https://www.sqlite.org/appfileformat.html

But perhaps the biggest gains are:

* one can do analysis (EXPLAIN QUERY PLAN) to see if we need more indexes
* create additional indexes outside of code
* Pretty easy to learn and "become an expert" at, without diving into low-level gory details.

Further comments below:
 

I expect that sqlite has a selection of things like checkpoints,
before-looks, after-looks, roll-back, roll-forward etc just like BSDDB. I
expect that sqlite is quite a complicated piece of software, just like
BSDDB. I expect sqlite has a whole range of configuration options just like
BSDDB. I expect sqlite has few Gramps developers who have a low level
understanding of the database, just like BSDDB. I expect sqlite has bugs
just like BSDDB. BSDDB is supported by Oracle, so I would expect it to be
fairly robust.

Sqlite is packaged up unlike using BSDDB. We have to code everything in BSDDB... lots of choices on flags, page sizes, etc. 
 

OK, maybe you will say that sqlite is simpler than BSDDB, but I doubt that
it is the orders of magnitude simpler that will avoid these problem. I
understand that BSDDB is designed for concurrent access, whereas sqlite is
less so. However, we seem to get corruption problems even when there is only
one user accessing the database (opening the same family twice is sometimes
the cause of problems, but not often).

How do sqlite and BSDDB compare in reliability if an application crashes in
the middle of writing a transaction? My guess is that some of the database
corruptions we see are due to errors in finding auxiliary files (hence the
messages about 'environment not found') is sqlite likely to be any different
in possible problems here?

I am not against sqlite in general, just why should I believe it is more
reliable?

Don't take anyone's word for it... just try it out and see if it works for you. If it doesn't work out, there are now many DBMS's to pick.
 


DS Blank wrote
> Here is what I am imaging: we could do a serialized dump (XML or JSON) of
> an object after it changes to a timestamped text file. This file would
> include all of the items that changed in a transaction. So, every file
> would have all of the changed objects from the transaction. Deleted
> objects
> would be marked as well.
>
> So, you have a full backup occasionally, followed by a series of
> transaction files. One can recover any state by starting with a backup,
> and
> "replaying" the transactions from then until a point beyond that. I think
> it is simple, and would be hard to mess up... you might lose a transaction
> (file corruption) but that is fairly robust.

Isn't that what a DBMS does?

Perhaps there are DBMS that could do these types of functions, but the cost of getting those installed for cross-platform project is perhaps too high. And part of the goal in my mind is to make these archive data useful even if your amazing DBMS can't open the data any more. 
 
And a DBMS is quite a complicated thing to
implement, so why do you think we could implement a reliable transaction
system to produce XML output? I could see all sorts of difficulties in
reliably replicating all updates into XML, and in replaying them. Just as a
set of random problems, what about timestamp accuracy and consistency,
synchronising the full backup with the incremental ones, Gramps database
undo or roll-back, ensuring we get it right every time, multiple threads
etc. Doesn't sound like something that would be easy to get reliable enough.

I'm talking about using our current transaction management system that already does all of the above. The only thing it does not do is stay persistent between sessions (that looks to be a trivial change, BTW; more on that later). Currently, we have a BSDDB of all non-batch-nomagic transactions. Wouldn't it be nice to have that:

* be persistent
* record batch changes (imports)
* be available in non-binary form

There are some details to think about. 

-Doug
 

Regards,
Tim.



--
View this message in context: http://gramps.1791082.n4.nabble.com/Gramps-5-0-Decisions-tp4672349p4672489.html
Sent from the GRAMPS - Dev mailing list archive at Nabble.com.

------------------------------------------------------------------------------
_______________________________________________
Gramps-devel mailing list
[hidden email]
https://lists.sourceforge.net/lists/listinfo/gramps-devel


------------------------------------------------------------------------------

_______________________________________________
Gramps-devel mailing list
[hidden email]
https://lists.sourceforge.net/lists/listinfo/gramps-devel
Reply | Threaded
Open this post in threaded view
|

Re: Gramps 5.0 Decisions

Josip
In reply to this post by Tim Lyons
18. 08. 2015. u 18:33, Tim Lyons je napisao/la:

> DS Blank wrote
>> I think you summarize the issue well.
>
> Thanks you.
>
>
> DS Blank wrote
>> I think that many of the problems of the past will go away with DBAPI
>> using
>> sqlite3. The file format (probably) won't change, and has a good support
>> of
>> tools. But I want a great system that will work regardless of what
>> database
>> backend a user uses.
>
> Why do you think sqlite will make the problems go away?
>
> I expect that sqlite has a selection of things like checkpoints,
> before-looks, after-looks, roll-back, roll-forward etc just like BSDDB. I
> expect that sqlite is quite a complicated piece of software, just like
> BSDDB. I expect sqlite has a whole range of configuration options just like
> BSDDB. I expect sqlite has few Gramps developers who have a low level
> understanding of the database, just like BSDDB. I expect sqlite has bugs
> just like BSDDB. BSDDB is supported by Oracle, so I would expect it to be
> fairly robust.
>
> OK, maybe you will say that sqlite is simpler than BSDDB, but I doubt that
> it is the orders of magnitude simpler that will avoid these problem. I
> understand that BSDDB is designed for concurrent access, whereas sqlite is
> less so. However, we seem to get corruption problems even when there is only
> one user accessing the database (opening the same family twice is sometimes
> the cause of problems, but not often).
>
> How do sqlite and BSDDB compare in reliability if an application crashes in
> the middle of writing a transaction? My guess is that some of the database
> corruptions we see are due to errors in finding auxiliary files (hence the
> messages about 'environment not found') is sqlite likely to be any different
> in possible problems here?
>
> I am not against sqlite in general, just why should I believe it is more
> reliable?
>
>
> DS Blank wrote
>> Here is what I am imaging: we could do a serialized dump (XML or JSON) of
>> an object after it changes to a timestamped text file. This file would
>> include all of the items that changed in a transaction. So, every file
>> would have all of the changed objects from the transaction. Deleted
>> objects
>> would be marked as well.
>>
>> So, you have a full backup occasionally, followed by a series of
>> transaction files. One can recover any state by starting with a backup,
>> and
>> "replaying" the transactions from then until a point beyond that. I think
>> it is simple, and would be hard to mess up... you might lose a transaction
>> (file corruption) but that is fairly robust.
>
> Isn't that what a DBMS does? And a DBMS is quite a complicated thing to
> implement, so why do you think we could implement a reliable transaction
> system to produce XML output? I could see all sorts of difficulties in
> reliably replicating all updates into XML, and in replaying them. Just as a
> set of random problems, what about timestamp accuracy and consistency,
> synchronising the full backup with the incremental ones, Gramps database
> undo or roll-back, ensuring we get it right every time, multiple threads
> etc. Doesn't sound like something that would be easy to get reliable enough.
>
> Regards,
> Tim.
>

Well said Tim!

We do not know how to use BSDDB and do not want to use SQLite properly.
We do not use DBMS, we use backends. As we do not use them there is no
need to embed them.
If things not works buy a quantum computer, if they still not works
pickle him.


--
Josip

------------------------------------------------------------------------------
_______________________________________________
Gramps-devel mailing list
[hidden email]
https://lists.sourceforge.net/lists/listinfo/gramps-devel
Reply | Threaded
Open this post in threaded view
|

Re: Gramps 5.0 Decisions

Tim Lyons
Administrator
In reply to this post by enno
enno wrote
Tim,
> How do sqlite and BSDDB compare in reliability if an application
> crashes in the middle of writing a transaction?My guess is that some
> of the database corruptions we see are due to errors in finding
> auxiliary files (hence the messages about 'environment not found') is
> sqlite likely to be any different in possible problems here? I am not
> against sqlite in general, just why should I believe it is more reliable?
Well, first, I believe there are no auxiliary files, or not as many as
with BSDDB. I may have seen a .journal file when working with
RootsMagic, but it basically uses a single file, with a single format,
regardless of OS, CPU architecture, whatever.
Well, there HAVE TO BE auxiliary files, to ensure that the database is protected against all sorts of failures. If there aren't then I would have no interest in using it.

Have a look at:
https://www.sqlite.org/howtocorrupt.html
which impresses me that they are implementing a serious DBMS. It talks about all sorts of auxiliary files that are used to maintain reliability and integrity.

Regards,
Tim.
Reply | Threaded
Open this post in threaded view
|

Re: Gramps 5.0 Decisions

enno
Tim,

> enno wrote
>> Tim,
>>> How do sqlite and BSDDB compare in reliability if an application
>>> crashes in the middle of writing a transaction?My guess is that some
>>> of the database corruptions we see are due to errors in finding
>>> auxiliary files (hence the messages about 'environment not found') is
>>> sqlite likely to be any different in possible problems here? I am not
>>> against sqlite in general, just why should I believe it is more reliable?
>> Well, first, I believe there are no auxiliary files, or not as many as
>> with BSDDB. I may have seen a .journal file when working with
>> RootsMagic, but it basically uses a single file, with a single format,
>> regardless of OS, CPU architecture, whatever.
> Well, there HAVE TO BE auxiliary files, to ensure that the database is
> protected against all sorts of failures. If there aren't then I would have
> no interest in using it.
>
> Have a look at:
> https://www.sqlite.org/howtocorrupt.html
> which impresses me that they are implementing a serious DBMS. It talks about
> all sorts of auxiliary files that are used to maintain reliability and
> integrity.
Right, I see one sort, which is the journal file mentioned above. I see
that during GEDCOM import, but it disappears once that is done, in which
case the RootsMagic database is a single file again. Similarly, when you
export to SQLite from Gramps, there is an Untitled_1.sql-journal file
when the export is running.

Once done, that journal is gone too, so when not in use, you have a
single file again. There may be a .wal file when you use write ahead
locking instead of a journal, but since they seem to be mutually
exclusive, all sorts really means .journal or .wal.

regards,

Enno


------------------------------------------------------------------------------
_______________________________________________
Gramps-devel mailing list
[hidden email]
https://lists.sourceforge.net/lists/listinfo/gramps-devel
Reply | Threaded
Open this post in threaded view
|

Re: Gramps 5.0 Decisions

Tim Lyons
Administrator
In reply to this post by DS Blank
DS Blank wrote
First I should say that I am in no way trying to convince anyone to switch
away from BSDDB if they like it, or prefer it. I am only working on making
Gramps more flexible by allowing different backends.


Don't take anyone's word for it... just try it out and see if it works for you. If it doesn't work out, there are now many DBMS's to pick.
 

Like 93% of Gramps users[1], I am not running on Linux, so I don't really have a choice of which DBMS I use, it is decided by the packagers.


I was initially very sceptical about sqlite, especially because I seem to recall that someone on this mailing list said that they did not regard sqlite as a serious DBMS, but only suitable for toy use.

However, given the remarks from Enno (thanks) on how widespread its use is and how robust it seems, and a quick look at some of the sqlite documentation[2], I am beginning to wonder whether we should go for sqlite as the primary DBMS (i.e. as the one that is bundled with Mac and Windows).

We certainly have not had much success with BSDDB; if there is any kind of failure, I would expect a DBMS next time it starts to automatically recover to the last completed transaction. The most you should lose is the action you were doing when Gramps crashed. Instead, we mainly get 'unable to find environment', and possibly one might be able to recover something if you load some special DBMS recovery software (which we will never manage to get Aunt Martha to do, even if the software is available on her platform - I've certainly never managed to do it). It seems the best we are likely to achieve is recovery to the last checkpoint, which seems to be the last time the database was closed before opening it for the session that eventually failed.

I would like performance to roughly match BSDDB. At the moment, the latest figures seem to be that view update takes twice as long with sqlite as with BSDDB. I don't really see why that should be the case. Are we using sqlite properly?

I don't know what the timings for GEDCOM import are at the moment - I see to recall that there was some question as to whether the import included updating all the indexes or something.

I would also like to be sure that updates (what we called magic updates) are wrapped in a single transaction within sqlite, so that the database is updated as a single action. An example would be GEDCOM import of more than 1000 people; ensuring that it is a single transaction will be critical to performance. (There were various comments about transactions, and whether we created them ourselves - I don't see how that can possibly work - you are either using the underlying DBMS to do its update as a single transaction, or you are not - I don't think you can do it above the DBMS level).

Josip says
> We do not know how to use BSDDB and do not want to use SQLite properly.
> We do not use DBMS, we use backends. As we do not use them there is no
> need to embed them.

I agree that we probably don't know how to use BSDDB properly - at least we don't seem to be able to get it to work reliably.


I don't understand the rest of what Josip says. Surely we want to use sqlite properly? I accept that we may not have the skill to use sqlite properly, but we are no worse off in that respect than with BSDDB.

I think we do use DBMS; we want to ensure that updates are atomic and reliable. I think we use DBMS rollback to undo transaction when we click on undo. We certainly use this when we want to undo a partial update because we decide something has gone wrong at the application logic level.



Regards,
Tim.



[1] Current figures for 4.1.3 are:

Mac (Intel + PPC)       9%
Windows (32+64 bit)  84%
Linux                          8%

(OK, some people may be downloading from places other than Sourceforge, but the figures are probably roughly right).

[2] Mainly https://www.sqlite.org/howtocorrupt.html which tells me they are serious about corruption.
Reply | Threaded
Open this post in threaded view
|

Re: Gramps 5.0 Decisions

Tim Lyons
Administrator
In reply to this post by DS Blank
DS Blank wrote
On Tue, Aug 18, 2015 at 12:33 PM, Tim Lyons <[hidden email]> wrote:
> DS Blank wrote
> > Here is what I am imaging: we could do a serialized dump (XML or JSON) of
> > an object after it changes to a timestamped text file. This file would
> > include all of the items that changed in a transaction. So, every file
> > would have all of the changed objects from the transaction. Deleted
> > objects
> > would be marked as well.
> >
> > So, you have a full backup occasionally, followed by a series of
> > transaction files. One can recover any state by starting with a backup,
> > and
> > "replaying" the transactions from then until a point beyond that. I think
> > it is simple, and would be hard to mess up... you might lose a
> transaction
> > (file corruption) but that is fairly robust.
>
> Isn't that what a DBMS does?


Perhaps there are DBMS that could do these types of functions, but the cost
of getting those installed for cross-platform project is perhaps too high.
And part of the goal in my mind is to make these archive data useful even
if your amazing DBMS can't open the data any more.


> And a DBMS is quite a complicated thing to
> implement, so why do you think we could implement a reliable transaction
> system to produce XML output? I could see all sorts of difficulties in
> reliably replicating all updates into XML, and in replaying them. Just as a
> set of random problems, what about timestamp accuracy and consistency,
> synchronising the full backup with the incremental ones, Gramps database
> undo or roll-back, ensuring we get it right every time, multiple threads
> etc. Doesn't sound like something that would be easy to get reliable
> enough.
>

I'm talking about using our current transaction management system that
already does all of the above. The only thing it does not do is stay
persistent between sessions (that looks to be a trivial change, BTW; more
on that later). Currently, we have a BSDDB of all non-batch-nomagic
transactions. Wouldn't it be nice to have that:

* be persistent
* record batch changes (imports)
* be available in non-binary form

There are some details to think about.
Doug,

I think you misunderstand.

I am not saying 'that is what a DBMS does' so we should find a DBMS that would do it for us.

I am saying 'that is what a DBMS does' so it is something that is really complicated, so it would be really, really difficult to implement.

Does our 'transaction management system' really do all those things? I thought that it handed the transactions off to BSDDB. Also, transactions that are not non-batch-nomagic transactions would also have to be implemented.

By the way, does the sqlite system implement all the magic transactions? (I.e. does our code pass the transaction element down to sqlite - in other words, are we using sqlite properly?)

Regards,
Tim.
123