Proposed new database configuration file

classic Classic list List threaded Threaded
11 messages Options
Reply | Threaded
Open this post in threaded view
|

Proposed new database configuration file

Nick Hall
Devs,

At the moment each database resides in a separate directory with an 8
digit hex name (e.g. 59ea21ff).  This contains three configuration
files:  database.txt which contains the backend type (bsddb, sqlite or
postgresql), name.txt which contains the family tree name and
meta_data.db which is used to determine the last modified time.  The
meta_data.db file is an empty file in sqlite and postgresql backends.

I have been experimenting with storing this information in a new
database.ini file, with one section per database.  A bsddb database
would have a section as follows:

[Family Tree 1]
dbtype = bsddb
directory = 59ea21ff
modified = 1508517042

For sqlite we no longer need a directory.  So we can just record a path
to the database file:

[Family Tree 2]
dbtype = sqlite
path = Family_Tree_2.db
modified = 1508517042

For postgresql we only need the connection details:

[Family Tree 3]
dbtype = postgresql
dbname = Family_Tree_3
host = server.mysite.com
port = 12345
modified = 1508517042

Any additional keys will be used as connection parameters.

There were a couple of minor issues.  Renaming an open sqlite database
causes problems, so I suggest only allowing renaming of closed family
trees.  I also recommend removing the ability to specify a family tree
by path from the command line.

One remaining problem is where to store RCS files for databases that no
longer have their own directory.  Perhaps we could create an RCS directory?

Do you like the idea of a database.ini file?

Regards,


Nick.


------------------------------------------------------------------------------
Check out the vibrant tech community on one of the world's most
engaging tech sites, Slashdot.org! http://sdm.link/slashdot
_______________________________________________
Gramps-devel mailing list
[hidden email]
https://lists.sourceforge.net/lists/listinfo/gramps-devel
Reply | Threaded
Open this post in threaded view
|

Re: Proposed new database configuration file

Paul Franklin-5
> Do you like the idea of a database.ini file?

While I have no specific comments about your proposal I
will offer my general opinion that I wish it was going into
"master" -- and better yet had gone into master a year ago.

I dislike doing major development close to a "final" release,
as the amount of testing and use any possible major change
can have will necessarily be limited.  We could get burned.

I believe many projects do not allow major changes when
they are close to a "release candidate" stage, which is
what our "beta" will be, whether we call it that or not.

But I am not the boss.  You are.  You did ask, however.

------------------------------------------------------------------------------
Check out the vibrant tech community on one of the world's most
engaging tech sites, Slashdot.org! http://sdm.link/slashdot
_______________________________________________
Gramps-devel mailing list
[hidden email]
https://lists.sourceforge.net/lists/listinfo/gramps-devel
Reply | Threaded
Open this post in threaded view
|

Re: Proposed new database configuration file

Nick Hall
On 21/10/17 09:48, Paul Franklin wrote:
> While I have no specific comments about your proposal I
> will offer my general opinion that I wish it was going into
> "master" -- and better yet had gone into master a year ago.

The proposal is a response to feedback from the alpha releases. I
intended it to go into master, but since it makes changes to the API it
should really go into gramps50.  The load() method now takes a family
tree title instead of a path and the write_version method is no longer
required.

So, the following code:

os.mkdir(path)
with open(os.path.join(path, 'name.txt'), 'w') as f:
     f.write('Family Tree 1')
db = make_database('bsddb')
db.write_version(path)
db.load(path)

becomes:

dbconfig['Family Tree 1'] = {'dbtype': 'bsddb'}
db = make_database('bsddb')
db.load('Family Tree 1')

The load() method now creates the directory and writes it to the
configuration file.

Because I wrote it for master, existing sqlite and postgresql databases
will be automatically upgraded.

Nick.



------------------------------------------------------------------------------
Check out the vibrant tech community on one of the world's most
engaging tech sites, Slashdot.org! http://sdm.link/slashdot
_______________________________________________
Gramps-devel mailing list
[hidden email]
https://lists.sourceforge.net/lists/listinfo/gramps-devel
Reply | Threaded
Open this post in threaded view
|

Re: Proposed new database configuration file

John Ralls-2
In reply to this post by Nick Hall

> On Oct 20, 2017, at 10:16 AM, Nick Hall <[hidden email]> wrote:
>
> Devs,
>
> At the moment each database resides in a separate directory with an 8 digit hex name (e.g. 59ea21ff).  This contains three configuration files:  database.txt which contains the backend type (bsddb, sqlite or postgresql), name.txt which contains the family tree name and meta_data.db which is used to determine the last modified time.  The meta_data.db file is an empty file in sqlite and postgresql backends.
>
> I have been experimenting with storing this information in a new database.ini file, with one section per database.  A bsddb database would have a section as follows:
>
> [Family Tree 1]
> dbtype = bsddb
> directory = 59ea21ff
> modified = 1508517042
>
> For sqlite we no longer need a directory.  So we can just record a path to the database file:
>
> [Family Tree 2]
> dbtype = sqlite
> path = Family_Tree_2.db
> modified = 1508517042
>
> For postgresql we only need the connection details:
>
> [Family Tree 3]
> dbtype = postgresql
> dbname = Family_Tree_3
> host = server.mysite.com
> port = 12345
> modified = 1508517042
>
> Any additional keys will be used as connection parameters.
>
> There were a couple of minor issues.  Renaming an open sqlite database causes problems, so I suggest only allowing renaming of closed family trees.  I also recommend removing the ability to specify a family tree by path from the command line.
>
> One remaining problem is where to store RCS files for databases that no longer have their own directory.  Perhaps we could create an RCS directory?
>
> Do you like the idea of a database.ini file?

Nick,

Partly. I do think that the name and modification timestamp should live somewhere in the database, particularly for pgsql. In that case it's possible to use the database from different machines and keeping external ini files in sync would be difficult. On the other hand Gramps needs to be able to find the database, so some sort of local record is necessary.

You might consider keeping the hex name for SQLite3 files with the name in a table. That relieves you of worrying about whether the database is open or not when changing its name, and running a SQLite3 query is no slower than reading the name file for a BerkeleyDB database. For pgsql it might be slower to run that query for a remote connection, but if it's been changed from another machine there's no way besides querying to see it.

As for RCS files I doubt that they'll be useful for a SQL database. A web search on "version sql data" produces several interesting articles.

Regards,
John Ralls




------------------------------------------------------------------------------
Check out the vibrant tech community on one of the world's most
engaging tech sites, Slashdot.org! http://sdm.link/slashdot
_______________________________________________
Gramps-devel mailing list
[hidden email]
https://lists.sourceforge.net/lists/listinfo/gramps-devel
Reply | Threaded
Open this post in threaded view
|

Re: Proposed new database configuration file

Nick Hall
On 21/10/17 17:55, John Ralls wrote:
> Partly. I do think that the name and modification timestamp should live somewhere in the database, particularly for pgsql. In that case it's possible to use the database from different machines and keeping external ini files in sync would be difficult. On the other hand Gramps needs to be able to find the database, so some sort of local record is necessary.
Yes.  We need a local record of the connection information.  At the
moment, this is stored in files in the directory structure.  To find the
family tree name, database type and connection settings, we read the
name.txt, database.txt and settings.ini files in each directory below
the root database directory (~.gramps/grampsdb).

I am proposing moving all this information into a single configuration file.

This enables us to get rid of the local directories for sqlite and
posgresql databases.  The sqlite database files can be stored in the
root database directory.

>
> You might consider keeping the hex name for SQLite3 files with the name in a table. That relieves you of worrying about whether the database is open or not when changing its name, and running a SQLite3 query is no slower than reading the name file for a BerkeleyDB database. For pgsql it might be slower to run that query for a remote connection, but if it's been changed from another machine there's no way besides querying to see it.

Do we still need a hex name for sqlite and postgresql databases?

I can see that recording the family tree name within a sqlite database
may be useful.  If it is transferred to a memory key and renamed, then
the name is still accessible within the database. In a postgresql
database the family tree name will initially be the same as the database
name.  Does it really matter if one user decides to use a different name
locally?

The last modified time is actually the last accessed time.  On BSDDB the
metadata is written back when the database is closed. On DBAPI databases
we do a "touch meta_data.db" in the close() method.  Storing a timestamp
in the database could be useful when the database is on a server, but we
would have to open each database to read it when opening the family tree
manager.  Is a user-specific timestamp acceptable for now?

Regards,


Nick.



------------------------------------------------------------------------------
Check out the vibrant tech community on one of the world's most
engaging tech sites, Slashdot.org! http://sdm.link/slashdot
_______________________________________________
Gramps-devel mailing list
[hidden email]
https://lists.sourceforge.net/lists/listinfo/gramps-devel
Reply | Threaded
Open this post in threaded view
|

Re: Proposed new database configuration file

John Ralls-2

> On Oct 21, 2017, at 1:47 PM, Nick Hall <[hidden email]> wrote:
>
> On 21/10/17 17:55, John Ralls wrote:
>> Partly. I do think that the name and modification timestamp should live somewhere in the database, particularly for pgsql. In that case it's possible to use the database from different machines and keeping external ini files in sync would be difficult. On the other hand Gramps needs to be able to find the database, so some sort of local record is necessary.
> Yes.  We need a local record of the connection information.  At the moment, this is stored in files in the directory structure.  To find the family tree name, database type and connection settings, we read the name.txt, database.txt and settings.ini files in each directory below the root database directory (~.gramps/grampsdb).
>
> I am proposing moving all this information into a single configuration file.
>
> This enables us to get rid of the local directories for sqlite and posgresql databases.  The sqlite database files can be stored in the root database directory.

Yes, I understand all of that. I think that having three files was a dumb design; entries in settings.ini would have been cleaner and faster (one open() instead of 3). For SQLite3 all of that info can go into a table. No change to the design except that instead of 1a2b3c4d directory you have 1a2b3c4d.db. For pgsql where there's otherwise no presence at all in the Gramps database directory you'd have e5f6a7b8.ini with the database uri and credentials.

Yes, it would be more efficient to just have a single ini file with all of the information because everything gets loaded in a single file access. The cost is that the ini file can potentially get out of sync with the databases.

>
>>
>> You might consider keeping the hex name for SQLite3 files with the name in a table. That relieves you of worrying about whether the database is open or not when changing its name, and running a SQLite3 query is no slower than reading the name file for a BerkeleyDB database. For pgsql it might be slower to run that query for a remote connection, but if it's been changed from another machine there's no way besides querying to see it.
>
> Do we still need a hex name for sqlite and postgresql databases?
>
> I can see that recording the family tree name within a sqlite database may be useful.  If it is transferred to a memory key and renamed, then the name is still accessible within the database. In a postgresql database the family tree name will initially be the same as the database name.  Does it really matter if one user decides to use a different name locally?
>
> The last modified time is actually the last accessed time.  On BSDDB the metadata is written back when the database is closed. On DBAPI databases we do a "touch meta_data.db" in the close() method.  Storing a timestamp in the database could be useful when the database is on a server, but we would have to open each database to read it when opening the family tree manager.  Is a user-specific timestamp acceptable for now?
>

The actual name of the database in pgsql or file in SQLite3 is unimportant to the user just as it is with a BerkeleyDB environment directory. In the case of the SQLite3 filename if it doesn't track the database name that appears in the Family Trees window then there's no restriction on it being open when the user renames it.

I hadn't thought of the use-case of multiple users sharing the same database but each calling it by a different name; I was thinking of the single user with more than one computer who'd be annoyed by having to change the name on each computer. I suspect that in reality databases don't get renamed that often and it's not important. I suspect having access time mean "the last time the database was accessed from this computer" might be a bigger annoyance, both because if I updated something from my laptop yesterday and my desktop says the database was last accessed on Tuesday I might worry that some work had been lost and because I don't really care when the database was last opened, I want to know when the last time an UPDATE or INSERT query was run on it. But I'm not planning to use a server-based DB so it's acceptable to *me* for now--or forever, for that matter--the folks you need to ask are people who are using a DB on a server.

Regards,
John Ralls




------------------------------------------------------------------------------
Check out the vibrant tech community on one of the world's most
engaging tech sites, Slashdot.org! http://sdm.link/slashdot
_______________________________________________
Gramps-devel mailing list
[hidden email]
https://lists.sourceforge.net/lists/listinfo/gramps-devel
Reply | Threaded
Open this post in threaded view
|

Re: Proposed new database configuration file

Paul Franklin-5
In reply to this post by Nick Hall
On 10/21/17, Nick Hall <[hidden email]> wrote:
> On 21/10/17 09:48, Paul Franklin wrote:
>> While I have no specific comments about your proposal I
>> will offer my general opinion that I wish it was going into
>> "master" -- and better yet had gone into master a year ago.
>
> The proposal is a response to feedback from the alpha releases. I
> intended it to go into master, but since it makes changes to the API it
> should really go into gramps50.

I disagree.

The fact that feedback is still happening, and that you are
proposing this, shows that development is still happening.

But the feedback only concerns the DBAPI backends, and this
proposal is similarly oriented, even if BSDDB is included.

All development belongs in master, not a maintenance branch.

We already acknowledged, over a year ago, that the DBAPI
changes are not yet ready for general release.  They need
more testing and any changes should be made in master.  We
have power users right now downloading master from GitHub
and attempting to use SQL-flavor DBs.  That can continue.

There is no reason to make a major change to the API for the
way the BSDDB DB is accessed, stored, or whatever.  It will
impact our "plain vanilla" users and you are essentially not
allowing any time to test it, for feedback to be provided,
and the whole concept you propose to possibly evolve.

What you propose to change is not a bug.  It's a change
in the API, a feature.

You are not alone in wanting your work to go into the next
gramps.  We all feel that way.  We always will.  But I put
"feature" code into master since the gramps50 maintenance
branch was created, and I know other developers have also.

I don't think whatever convenience is granted/enabled by
your change is worth the risk of something unforeseen, this
late in the release cycle.

Stop trying to make major changes to the gramps50 branch.

You yourself told us all to stop thinking about improvements
and to start testing and fixing bugs, so that 5.0.0 could be
released.  There is no reason to stop thinking like that.

I dislike these constantly-moving goalposts for 5.0.0.

It's frozen.  It's in a "maintenance" branch.  Let's release
it, rather than continuously trying to improve it.

Put your change into master, if you think it should be done.

Respectfully.

------------------------------------------------------------------------------
Check out the vibrant tech community on one of the world's most
engaging tech sites, Slashdot.org! http://sdm.link/slashdot
_______________________________________________
Gramps-devel mailing list
[hidden email]
https://lists.sourceforge.net/lists/listinfo/gramps-devel
Reply | Threaded
Open this post in threaded view
|

Re: Proposed new database configuration file

prculley
In reply to this post by John Ralls-2
One thing I don't see in this discussion is any mention of db upgrade.  I hope that users will be able to upgrade at least the bsddb db to the new configuration. 

It is probably less critical for dbapi dbs to be upgraded, although I think it is nice to our testers who may be using 5.0 a bit more than recommended.

Beyond that I don't have a current good reason to comment on the proposal.

Paul Culley

On Sat, Oct 21, 2017 at 7:42 PM, John Ralls <[hidden email]> wrote:

> On Oct 21, 2017, at 1:47 PM, Nick Hall <[hidden email]> wrote:
>
> On 21/10/17 17:55, John Ralls wrote:
>> Partly. I do think that the name and modification timestamp should live somewhere in the database, particularly for pgsql. In that case it's possible to use the database from different machines and keeping external ini files in sync would be difficult. On the other hand Gramps needs to be able to find the database, so some sort of local record is necessary.
> Yes.  We need a local record of the connection information.  At the moment, this is stored in files in the directory structure.  To find the family tree name, database type and connection settings, we read the name.txt, database.txt and settings.ini files in each directory below the root database directory (~.gramps/grampsdb).
>
> I am proposing moving all this information into a single configuration file.
>
> This enables us to get rid of the local directories for sqlite and posgresql databases.  The sqlite database files can be stored in the root database directory.

Yes, I understand all of that. I think that having three files was a dumb design; entries in settings.ini would have been cleaner and faster (one open() instead of 3). For SQLite3 all of that info can go into a table. No change to the design except that instead of 1a2b3c4d directory you have 1a2b3c4d.db. For pgsql where there's otherwise no presence at all in the Gramps database directory you'd have e5f6a7b8.ini with the database uri and credentials.

Yes, it would be more efficient to just have a single ini file with all of the information because everything gets loaded in a single file access. The cost is that the ini file can potentially get out of sync with the databases.

>
>>
>> You might consider keeping the hex name for SQLite3 files with the name in a table. That relieves you of worrying about whether the database is open or not when changing its name, and running a SQLite3 query is no slower than reading the name file for a BerkeleyDB database. For pgsql it might be slower to run that query for a remote connection, but if it's been changed from another machine there's no way besides querying to see it.
>
> Do we still need a hex name for sqlite and postgresql databases?
>
> I can see that recording the family tree name within a sqlite database may be useful.  If it is transferred to a memory key and renamed, then the name is still accessible within the database. In a postgresql database the family tree name will initially be the same as the database name.  Does it really matter if one user decides to use a different name locally?
>
> The last modified time is actually the last accessed time.  On BSDDB the metadata is written back when the database is closed. On DBAPI databases we do a "touch meta_data.db" in the close() method.  Storing a timestamp in the database could be useful when the database is on a server, but we would have to open each database to read it when opening the family tree manager.  Is a user-specific timestamp acceptable for now?
>

The actual name of the database in pgsql or file in SQLite3 is unimportant to the user just as it is with a BerkeleyDB environment directory. In the case of the SQLite3 filename if it doesn't track the database name that appears in the Family Trees window then there's no restriction on it being open when the user renames it.

I hadn't thought of the use-case of multiple users sharing the same database but each calling it by a different name; I was thinking of the single user with more than one computer who'd be annoyed by having to change the name on each computer. I suspect that in reality databases don't get renamed that often and it's not important. I suspect having access time mean "the last time the database was accessed from this computer" might be a bigger annoyance, both because if I updated something from my laptop yesterday and my desktop says the database was last accessed on Tuesday I might worry that some work had been lost and because I don't really care when the database was last opened, I want to know when the last time an UPDATE or INSERT query was run on it. But I'm not planning to use a server-based DB so it's acceptable to *me* for now--or forever, for that matter--the folks you need to ask are people who are using a DB on a server.

Regards,
John Ralls




------------------------------------------------------------------------------
Check out the vibrant tech community on one of the world's most
engaging tech sites, Slashdot.org! http://sdm.link/slashdot
_______________________________________________
Gramps-devel mailing list
[hidden email]
https://lists.sourceforge.net/lists/listinfo/gramps-devel


------------------------------------------------------------------------------
Check out the vibrant tech community on one of the world's most
engaging tech sites, Slashdot.org! http://sdm.link/slashdot
_______________________________________________
Gramps-devel mailing list
[hidden email]
https://lists.sourceforge.net/lists/listinfo/gramps-devel
Reply | Threaded
Open this post in threaded view
|

Re: Proposed new database configuration file

Nick Hall
On 22/10/17 15:19, Paul Culley wrote:
> One thing I don't see in this discussion is any mention of db
> upgrade.  I hope that users will be able to upgrade at least the bsddb
> db to the new configuration.
>
> It is probably less critical for dbapi dbs to be upgraded, although I
> think it is nice to our testers who may be using 5.0 a bit more than
> recommended.

My prototype code creates a database.ini from the existing directory
structure if it doesn't already exist.

Nick.



------------------------------------------------------------------------------
Check out the vibrant tech community on one of the world's most
engaging tech sites, Slashdot.org! http://sdm.link/slashdot
_______________________________________________
Gramps-devel mailing list
[hidden email]
https://lists.sourceforge.net/lists/listinfo/gramps-devel
Reply | Threaded
Open this post in threaded view
|

Re: Proposed new database configuration file

Nick Hall
In reply to this post by John Ralls-2
On 22/10/17 01:42, John Ralls wrote:
> Yes, I understand all of that. I think that having three files was a dumb design; entries in settings.ini would have been cleaner and faster (one open() instead of 3). For SQLite3 all of that info can go into a table. No change to the design except that instead of 1a2b3c4d directory you have 1a2b3c4d.db. For pgsql where there's otherwise no presence at all in the Gramps database directory you'd have e5f6a7b8.ini with the database uri and credentials.
We need to know the database type, currently stored in "database.txt"
before we open a database.  We could assume that a directory always
contains a bsddb database and a ".db" extension is always a sqlite
database, but that may restrict us in the future.  I considered a
separate "settings.ini" for each database, but a single "database.ini"
seemed neater.

Storing the family tree name in the database is a good suggestion, but
in the case of postgresql I think we need a local name in case the
connection fails.  We can then mark a tree as unavailable by name.

Also, at the moment we can easily see which family tree is in which
directory with a command like:

grep -R --include='name.txt' . .

A single ini file would quickly provide the same information.

>
> Yes, it would be more efficient to just have a single ini file with all of the information because everything gets loaded in a single file access. The cost is that the ini file can potentially get out of sync with the databases.

I don't see the family tree name getting out of sync as a problem.  For
postgresql, the family tree name is used initially to provide the
database name.  Users will probably not change this anyway.

However, I agree with you that storing the last modified timestamp in
the ini file is a problem.  In a multi-user database knowing who made
the last modification would also be useful. Perhaps storing this
information in a database table after every transaction commit would
provide an acceptable solution.  We could create a new
get_last_modified() database method.

Regards,


Nick.



------------------------------------------------------------------------------
Check out the vibrant tech community on one of the world's most
engaging tech sites, Slashdot.org! http://sdm.link/slashdot
_______________________________________________
Gramps-devel mailing list
[hidden email]
https://lists.sourceforge.net/lists/listinfo/gramps-devel
Reply | Threaded
Open this post in threaded view
|

Re: Proposed new database configuration file

Nick Hall
In reply to this post by Paul Franklin-5
On 22/10/17 03:17, Paul Franklin wrote:
> All development belongs in master, not a maintenance branch.

No.  New features and enhancements belong in the master branch.

This proposal attempts to fix a design problem.

Nick.



------------------------------------------------------------------------------
Check out the vibrant tech community on one of the world's most
engaging tech sites, Slashdot.org! http://sdm.link/slashdot
_______________________________________________
Gramps-devel mailing list
[hidden email]
https://lists.sourceforge.net/lists/listinfo/gramps-devel