Quantcast

why is reference_map.db such a huge file!?

classic Classic list List threaded Threaded
11 messages Options
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

why is reference_map.db such a huge file!?

TJMcK
With my very limited knowledge of dbs and how they work (from my recent reading of Oracle related docs), the speed of a database (and gramps) and how fast it can index is directly proportional to the size of the tables.  And to speed up gramps it may be necessary to split or partition tables (which may be a huge or impossible task??)
 
Now, is it correct to say that when I see a file like reference_map.db, and it's double the size of a couple other files and nearly 10x larger then the average db file, that there is a bottleneck here? (As well, this file is the most accessed db file - almost constant activity).

Or quite possibly I just need to understand something else about this reference_map.db file.. and why it's so large... Is it large for other users too?
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

Re: why is reference_map.db such a huge file!?

enno
Tim,

> With my very limited knowledge of dbs and how they work (from my recent
> reading of Oracle related docs), the speed of a database (and gramps) and
> how fast it can index is directly proportional to the size of the tables.
> And to speed up gramps it may be necessary to split or partition tables
> (which may be a huge or impossible task??)
>    
> Now, is it correct to say that when I see a file like reference_map.db, and
> it's double the size of a couple other files and nearly 10x larger then the
> average db file, that there is a bottleneck here? (As well, this file is the
> most accessed db file - almost constant activity).
>
> Or quite possibly I just need to understand something else about this
> reference_map.db file.. and why it's so large... Is it large for other users
> too?
It is large here too, and to me the name suggests that this one, and the
one named referenced_map.db, connect all the other ones. That would also
explain why it is accessed that much.

If that is true, one may expect that this file grows with the number of
connections between persons, events, sources, and so forth, and it may
indeed be a candidate for optimizing.

And I bet you're right saying that splitting or partitioning it may be a
huge task ...

regards,

Enno


------------------------------------------------------------------------------
"Accelerate Dev Cycles with Automated Cross-Browser Testing - For FREE
Instantly run your Selenium tests across 300+ browser/OS combos.  Get
unparalleled scalability from the best Selenium testing platform available.
Simple to use. Nothing to install. Get started now for free."
http://p.sf.net/sfu/SauceLabs
_______________________________________________
Gramps-users mailing list
[hidden email]
https://lists.sourceforge.net/lists/listinfo/gramps-users
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

Re: why is reference_map.db such a huge file!?

Bruce Moore
In reply to this post by TJMcK
Oracle/DB2/Postgres/MySQL etc are designed for dramatically (repeat,
dramatically) larger databases where you can spend tens (or hundreds) of
thousands of dollars on a software license.

Gramps uses berkeley database (BDB), a common (and good) choice for
embedded applications.  The partitioning functions discussed in the
Oracle documentation are generally not going to be available on BDB and
thus on Gramps.  It is also important to note the the speed of building
an index and the speed of accessing a table via an index are two
completely different things--building an index may take several hours
while a random access will be a few milliseconds.  Building an index is
definitely related to the size; accessing an index is generally
unrelated to size for all practical purposes.
 
I doubt that BDB offers partitioning.

I don't know the Gramps entity relationship model, but I suspect that
the reference map  table contains all of the pointers between people and
people, people and citations, citations and sources, people and places
etc., so it probably has at least one row for each row in every other
table in the database.  It doesn't surprise me that it is quite large.
Since all of the accesses likely include the relationship type and the
object id, all of the accesses will be indexed based, and very very
fast.  If Gramps were doing table scans for everything, it would be so
slow as to be completely unusable.

If you are interested in improving the database performance, I would
read up on the BDB db_config file and see if there are ways to change
caching, page size, and locking.  In most embedded applications the
development choices are based upon settings that will run in all
environments, rather than high performance.  You may be able to change
some settings that will take advantage of more memory on your machine
(but which might not run at all on a smaller box).

You might investigate utilities for reorganizing (sorting) tables and/or
indexes.  I don't know if BDB offers these capabilities.

Read a recent thread on loading very large GEDCOM files for a starting
place.

Bruce Moore

On 04/28/2014 02:31 PM, TJMcK wrote:

> With my very limited knowledge of dbs and how they work (from my recent
> reading of Oracle related docs), the speed of a database (and gramps) and
> how fast it can index is directly proportional to the size of the tables.
> And to speed up gramps it may be necessary to split or partition tables
> (which may be a huge or impossible task??)
>  
> Now, is it correct to say that when I see a file like reference_map.db, and
> it's double the size of a couple other files and nearly 10x larger then the
> average db file, that there is a bottleneck here? (As well, this file is the
> most accessed db file - almost constant activity).
>
> Or quite possibly I just need to understand something else about this
> reference_map.db file.. and why it's so large... Is it large for other users
> too?
>
>
>
> --
> View this message in context: http://gramps.1791082.n4.nabble.com/why-is-reference-map-db-such-a-huge-file-tp4665778.html
> Sent from the GRAMPS - User mailing list archive at Nabble.com.
>
> ------------------------------------------------------------------------------
> "Accelerate Dev Cycles with Automated Cross-Browser Testing - For FREE
> Instantly run your Selenium tests across 300+ browser/OS combos.  Get
> unparalleled scalability from the best Selenium testing platform available.
> Simple to use. Nothing to install. Get started now for free."
> http://p.sf.net/sfu/SauceLabs
> _______________________________________________
> Gramps-users mailing list
> [hidden email]
> https://lists.sourceforge.net/lists/listinfo/gramps-users

------------------------------------------------------------------------------
"Accelerate Dev Cycles with Automated Cross-Browser Testing - For FREE
Instantly run your Selenium tests across 300+ browser/OS combos.  Get
unparalleled scalability from the best Selenium testing platform available.
Simple to use. Nothing to install. Get started now for free."
http://p.sf.net/sfu/SauceLabs
_______________________________________________
Gramps-users mailing list
[hidden email]
https://lists.sourceforge.net/lists/listinfo/gramps-users

moore_bw_22.vcf (179 bytes) Download Attachment
smime.p7s (5K) Download Attachment
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

Re: why is reference_map.db such a huge file!?

TJMcK
This post was updated on .
Thanks Bruce... lots of interesting stuff... gets me on track with my reading.  I made the assumption that since the db_utilities that I downloaded from Oracle worked with BDB, that Oracle was a variation of BDB...

As for optimizing gramps with DB_CONFIG... I did spend quite a bit of time testing gramps with this file, but the only noticeable speed improvement was on gramps startup. I even tried setting cache to 0 bytes and gramps ran as normal.  But maybe I'll spend some more time researching this...  Another concern that I have is, that gramps may not be coded to use the DB_CONFIG file for all the parts (modules??) that could use extra RAM? (Again, I thought that what I read regarding the DB_CONFIG file, was in the Oracle documentation... so I'll check this again...)

Something that would be of interest would be putting just the reference_map.db in a ram disk. Would there be anyway to "link' a ram disk and the database folder to get gramps to think it's one and the same?  Then I could write a script to copy the reference file into the ram before loading gramps and copy it back into the harddrive db folder after gramps closes.  (And, since I discovered that linux already uses a hidden RAM disk, I'd just use that for the reference db file.)
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

Re: why is reference_map.db such a huge file!?

Ron Johnson
In reply to this post by Bruce Moore
On 04/28/2014 07:28 PM, TJMcK wrote:
[snip]
> Something that would be of interest would be putting just the
> reference_map.db in a ram disk. /Would there be anyway to "link' a ram disk
> and the database folder to think it's one and the same?/
unionfs might be what you're looking for.

>    Then I could write
> a script to copy the reference file into the ram before loading gramps and
> copy it back into the harddrive db folder after gramps closes.  (And, since
> I discovered that linux already uses a hidden RAM disk, I'd just use that
> for the reference db file.)

Why not just mv the file to the ramdisk then soft symlink it back to the db
directory?

--
"Mathematics deals exclusively with the relations of concepts to each
other without consideration of their relation to experience."
Albert Einstein


------------------------------------------------------------------------------
"Accelerate Dev Cycles with Automated Cross-Browser Testing - For FREE
Instantly run your Selenium tests across 300+ browser/OS combos.  Get
unparalleled scalability from the best Selenium testing platform available.
Simple to use. Nothing to install. Get started now for free."
http://p.sf.net/sfu/SauceLabs
_______________________________________________
Gramps-users mailing list
[hidden email]
https://lists.sourceforge.net/lists/listinfo/gramps-users
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

Re: why is reference_map.db such a huge file!?

TJMcK
I knew nothing about symlinks before you mentioned it... But from what I've read it would have to be a "fast symlink".  It doesn't appear to be a simple process from what I've seen. So far I haven't found any clear info about this, for an amateur such as I.  I will continue to research this option as this seems to be a good option for storing reference_map.db (or even all the map.db files).  And of course I need to test this setup to see if it will even work...
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

Re: why is reference_map.db such a huge file!?

Ron Johnson
In reply to this post by Ron Johnson
I've been using Linux for 14 years and have never hear of "fast symlinks".

[pause]

Ah, it's a change in the internal format that the FS stores link info. It's
so old that it's the only way I've ever seen symlinks stored.


As for difficulty... pish. Couldn't be simpler!

Let's pretend that the RAM disk is mounted at "~/ramdisk". Then, from a
command window, what you do is:
## Set up
cd ~/.gramps/grampsdb/{hex-string}
mv reference_map.db ~/ramdisk
ls -aFl ~/ramdisk ## TO VERIFY
ln -s ~/ramdisk/reference_map.db
ls -aFl reference_map.db ## TO VERIFY. (Will be *tiny*)

## DO YOUR GENEALOGY STUFF HERE
gramps

## Cleanup
rm reference_map.db
mv ~/ramdisk/reference_map.db .
ls -aFl reference_map.db ## TO VERIFY. Should be huge.


On 04/29/2014 03:05 PM, TJMcK wrote:
> I knew nothing about symlinks before you mentioned it... But from what I've
> read it would have to be a "fast symlink".  It doesn't appear to be a simple
> process from what I've seen. So far I haven't found any clear info about
> this, for an amateur such as I.  I will continue to research this option as
> this seems to be a good option for storing reference_map.db (or even all the
> map.db files).  And of course I need to test this setup to see if it will
> even work...

--
"Mathematics deals exclusively with the relations of concepts to each
other without consideration of their relation to experience."
Albert Einstein


------------------------------------------------------------------------------
"Accelerate Dev Cycles with Automated Cross-Browser Testing - For FREE
Instantly run your Selenium tests across 300+ browser/OS combos.  Get
unparalleled scalability from the best Selenium testing platform available.
Simple to use. Nothing to install. Get started now for free."
http://p.sf.net/sfu/SauceLabs
_______________________________________________
Gramps-users mailing list
[hidden email]
https://lists.sourceforge.net/lists/listinfo/gramps-users
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

Re: why is reference_map.db such a huge file!?

Tom Samstag
first time poster here, but I found it important enough to mention that if you're talking about
storing your db in a ramdisk (such as a /dev/shm style disk) remember that there is no persistence.
While Ron's directions would work, if your OS crashed or you ran out of power or something else
caused your computer to power off while you were in the "GENEALOGY STUFF" period, you'd be left with
a dangling symlink and no database.

On 2014-04-29 15:53, Ron Johnson wrote:

> I've been using Linux for 14 years and have never hear of "fast symlinks".
>
> [pause]
>
> Ah, it's a change in the internal format that the FS stores link info. It's
> so old that it's the only way I've ever seen symlinks stored.
>
>
> As for difficulty... pish. Couldn't be simpler!
>
> Let's pretend that the RAM disk is mounted at "~/ramdisk". Then, from a
> command window, what you do is:
> ## Set up
> cd ~/.gramps/grampsdb/{hex-string}
> mv reference_map.db ~/ramdisk
> ls -aFl ~/ramdisk ## TO VERIFY
> ln -s ~/ramdisk/reference_map.db
> ls -aFl reference_map.db ## TO VERIFY. (Will be *tiny*)
>
> ## DO YOUR GENEALOGY STUFF HERE
> gramps
>
> ## Cleanup
> rm reference_map.db
> mv ~/ramdisk/reference_map.db .
> ls -aFl reference_map.db ## TO VERIFY. Should be huge.
>
>
> On 04/29/2014 03:05 PM, TJMcK wrote:
>> I knew nothing about symlinks before you mentioned it... But from what I've
>> read it would have to be a "fast symlink".  It doesn't appear to be a simple
>> process from what I've seen. So far I haven't found any clear info about
>> this, for an amateur such as I.  I will continue to research this option as
>> this seems to be a good option for storing reference_map.db (or even all the
>> map.db files).  And of course I need to test this setup to see if it will
>> even work...
>

------------------------------------------------------------------------------
"Accelerate Dev Cycles with Automated Cross-Browser Testing - For FREE
Instantly run your Selenium tests across 300+ browser/OS combos.  Get
unparalleled scalability from the best Selenium testing platform available.
Simple to use. Nothing to install. Get started now for free."
http://p.sf.net/sfu/SauceLabs
_______________________________________________
Gramps-users mailing list
[hidden email]
https://lists.sourceforge.net/lists/listinfo/gramps-users
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

Re: why is reference_map.db such a huge file!?

Ron Johnson
In reply to this post by Ron Johnson
Such is the risk of using RAM disks. But really, when was the last time your
Linux PC froze or hung? In my case, it's been a *long* time.

Anyway, because you're a good and conscientious user, you make frequent
backups, so in the very unlikely case that your PC crashes, you won't have
lost a huge amount.

On 04/29/2014 06:06 PM, Tom Samstag wrote:

> first time poster here, but I found it important enough to mention that if you're talking about
> storing your db in a ramdisk (such as a /dev/shm style disk) remember that there is no persistence.
> While Ron's directions would work, if your OS crashed or you ran out of power or something else
> caused your computer to power off while you were in the "GENEALOGY STUFF" period, you'd be left with
> a dangling symlink and no database.
>
> On 2014-04-29 15:53, Ron Johnson wrote:
>> I've been using Linux for 14 years and have never hear of "fast symlinks".
>>
>> [pause]
>>
>> Ah, it's a change in the internal format that the FS stores link info. It's
>> so old that it's the only way I've ever seen symlinks stored.
>>
>>
>> As for difficulty... pish. Couldn't be simpler!
>>
>> Let's pretend that the RAM disk is mounted at "~/ramdisk". Then, from a
>> command window, what you do is:
>> ## Set up
>> cd ~/.gramps/grampsdb/{hex-string}
>> mv reference_map.db ~/ramdisk
>> ls -aFl ~/ramdisk ## TO VERIFY
>> ln -s ~/ramdisk/reference_map.db
>> ls -aFl reference_map.db ## TO VERIFY. (Will be *tiny*)
>>
>> ## DO YOUR GENEALOGY STUFF HERE
>> gramps
>>
>> ## Cleanup
>> rm reference_map.db
>> mv ~/ramdisk/reference_map.db .
>> ls -aFl reference_map.db ## TO VERIFY. Should be huge.
>>
>>
>> On 04/29/2014 03:05 PM, TJMcK wrote:
>>> I knew nothing about symlinks before you mentioned it... But from what I've
>>> read it would have to be a "fast symlink".  It doesn't appear to be a simple
>>> process from what I've seen. So far I haven't found any clear info about
>>> this, for an amateur such as I.  I will continue to research this option as
>>> this seems to be a good option for storing reference_map.db (or even all the
>>> map.db files).  And of course I need to test this setup to see if it will
>>> even work...
>>>


--
"Mathematics deals exclusively with the relations of concepts to each
other without consideration of their relation to experience."
Albert Einstein


------------------------------------------------------------------------------
"Accelerate Dev Cycles with Automated Cross-Browser Testing - For FREE
Instantly run your Selenium tests across 300+ browser/OS combos.  Get
unparalleled scalability from the best Selenium testing platform available.
Simple to use. Nothing to install. Get started now for free."
http://p.sf.net/sfu/SauceLabs
_______________________________________________
Gramps-users mailing list
[hidden email]
https://lists.sourceforge.net/lists/listinfo/gramps-users
Ken
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

Re: why is reference_map.db such a huge file!?

Ken
In reply to this post by Tom Samstag
That would not seem to be the case.

I moved all three *map.db files out of the way then loaded my small
database in Gramps.
The database opened fine and the three files were recreated but of a
much smaller size.

I have no idea what these files do or if they are actually loaded at
start up.

I'll have to experiment with my truly huge database.

Ken.


On 29/04/14 04:06 PM, Tom Samstag wrote:
> first time poster here, but I found it important enough to mention that if you're talking about
> storing your db in a ramdisk (such as a /dev/shm style disk) remember that there is no persistence.
> While Ron's directions would work, if your OS crashed or you ran out of power or something else
> caused your computer to power off while you were in the "GENEALOGY STUFF" period, you'd be left with
> a dangling symlink and no database.
>
>


------------------------------------------------------------------------------
"Accelerate Dev Cycles with Automated Cross-Browser Testing - For FREE
Instantly run your Selenium tests across 300+ browser/OS combos.  Get
unparalleled scalability from the best Selenium testing platform available.
Simple to use. Nothing to install. Get started now for free."
http://p.sf.net/sfu/SauceLabs
_______________________________________________
Gramps-users mailing list
[hidden email]
https://lists.sourceforge.net/lists/listinfo/gramps-users
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

Re: why is reference_map.db such a huge file!?

TJMcK
I have done a fair bit of testing with putting db files and/or all db files in a ramdisk.  

Conclusion: for the hardware that I use, putting any or all of the db in a ramdisk has little or no improvement in speed!
Caution: I hope that if any tried this that you made backups because there is a high probability there will be corruption (I have posted a bug)  The files that you move between different folder appear to get corrupted with low-level errors...  only your backup with save the testing db.

I've really appreciated the feedback about "large db files"... unfortunately I didn't make any great discoveries to increase the speed of gramps.  But now, my next step will be to attempt to profile gramps...  I've installed a couple of python profilers and Kcachegrind to see what they can tell me.  Do the developers do this on a regular basis?  Anyone ever done this?  Maybe I should start a new topic?
Loading...