Raw export format?

classic Classic list List threaded Threaded
6 messages Options
Reply | Threaded
Open this post in threaded view
|

Raw export format?

Ron Johnson
Hi,

(Installed v4.0.3 from sf.net.)

What exactly is it?  From looking at a file in vim, it appears that the code
simply goes through each database, printing records in Python "record"
format (hence the name "raw").

Then the next question is: what's it for?  Debugging?  On my primary tree,
it's 6% smaller than the associated xml dump, but xz compresses the xml 10%
more than the Python dump.

Speaking of exports: presumably the developers at some point tried using
Pickles as a backup format?

Ron

--
My word, man!  Don't you know your quantum statistics?


------------------------------------------------------------------------------
HPCC Systems Open Source Big Data Platform from LexisNexis Risk Solutions
Find What Matters Most in Your Big Data with HPCC Systems
Open Source. Fast. Scalable. Simple. Ideal for Dirty Data.
Leverages Graph Analysis for Fast Processing & Easy Data Exploration
http://p.sf.net/sfu/hpccsystems
_______________________________________________
Gramps-users mailing list
[hidden email]
https://lists.sourceforge.net/lists/listinfo/gramps-users
Reply | Threaded
Open this post in threaded view
|

Re: Raw export format?

Nick Hall
On 13/06/14 05:47, Ron Johnson wrote:
> Hi,
>
> (Installed v4.0.3 from sf.net.)
>
> What exactly is it?  From looking at a file in vim, it appears that the code
> simply goes through each database, printing records in Python "record"
> format (hence the name "raw").

Yes.

> Then the next question is: what's it for?  Debugging?  On my primary tree,
> it's 6% smaller than the associated xml dump, but xz compresses the xml 10%
> more than the Python dump.

It was probably written for debugging.  I can't find an equivalent import.

Only values are written, so you would have to look at the code of the
Gramps objects to understand the meaning of each value.  The citation
table is also missing in the raw export.


> Speaking of exports: presumably the developers at some point tried using
> Pickles as a backup format?

I don't think so.  We use a pickled format in the database.

Each Gramps object has a serialize and unserialize method.  A primary
object is serialized together with the secondary objects it contains.  
It is this serialized version that is stored in the database.  The raw
export allows us to see this.

The Gramps XML is more human-readable and should be used for backups.


Nick.


------------------------------------------------------------------------------
HPCC Systems Open Source Big Data Platform from LexisNexis Risk Solutions
Find What Matters Most in Your Big Data with HPCC Systems
Open Source. Fast. Scalable. Simple. Ideal for Dirty Data.
Leverages Graph Analysis for Fast Processing & Easy Data Exploration
http://p.sf.net/sfu/hpccsystems
_______________________________________________
Gramps-users mailing list
[hidden email]
https://lists.sourceforge.net/lists/listinfo/gramps-users
Reply | Threaded
Open this post in threaded view
|

Re: Raw export format?

Ron Johnson
In reply to this post by Ron Johnson
On 06/15/2014 12:28 PM, Nick Hall wrote:
[snip]
> It was probably written for debugging. I can't find an equivalent import.
> Only values are written, so you would have to look at the code of the
> Gramps objects to understand the meaning of each value. The citation table
> is also missing in the raw export.

Save some code complexity by ripping it out?

> I don't think so.  We use a pickled format in the database.
>
> Each Gramps object has a serialize and unserialize method.  A primary
> object is serialized together with the secondary objects it contains.
> It is this serialized version that is stored in the database.

DB as "BLOB storage device?  Interesting...

> The Gramps XML is more human-readable and should be used for backups.

I do.  My curiosity was just piqued by the format, though.

--
My word, man!  Don't you know your quantum statistics?


------------------------------------------------------------------------------
HPCC Systems Open Source Big Data Platform from LexisNexis Risk Solutions
Find What Matters Most in Your Big Data with HPCC Systems
Open Source. Fast. Scalable. Simple. Ideal for Dirty Data.
Leverages Graph Analysis for Fast Processing & Easy Data Exploration
http://p.sf.net/sfu/hpccsystems
_______________________________________________
Gramps-users mailing list
[hidden email]
https://lists.sourceforge.net/lists/listinfo/gramps-users
Reply | Threaded
Open this post in threaded view
|

Re: Raw export format?

DS Blank
On Sun, Jun 15, 2014 at 1:39 PM, Ron Johnson <[hidden email]> wrote:
On 06/15/2014 12:28 PM, Nick Hall wrote:
[snip]
> It was probably written for debugging. I can't find an equivalent import.
> Only values are written, so you would have to look at the code of the
> Gramps objects to understand the meaning of each value. The citation table
> is also missing in the raw export.

Yes, it was for debugging. When working on databases changes, it is sometimes nice to get a raw dump of the data. It would be trivial to write a importer, but it wasn't meant to be a real export format.
 
Save some code complexity by ripping it out?

Perhaps, but it is very short, and stand-alone.
 
> I don't think so.  We use a pickled format in the database.
>
> Each Gramps object has a serialize and unserialize method.  A primary
> object is serialized together with the secondary objects it contains.
> It is this serialized version that is stored in the database.

DB as "BLOB storage device?  Interesting...

Yes, in fact the Django databases uses the blob formats to keep speed up and keep compatibility. Knowing that we use the database mostly for these blobs, it should be fairly easy to swap out bsddb at some point, even if we use a relational database to store blobs.
 
> The Gramps XML is more human-readable and should be used for backups.

I do.  My curiosity was just piqued by the format, though.

BTW, one could also export in JSON too by calling .to_struct() on these objects as you write them out. That would also be an interesting format between raw and XML.

-Doug
 

--
My word, man!  Don't you know your quantum statistics?


------------------------------------------------------------------------------
HPCC Systems Open Source Big Data Platform from LexisNexis Risk Solutions
Find What Matters Most in Your Big Data with HPCC Systems
Open Source. Fast. Scalable. Simple. Ideal for Dirty Data.
Leverages Graph Analysis for Fast Processing & Easy Data Exploration
http://p.sf.net/sfu/hpccsystems
_______________________________________________
Gramps-users mailing list
[hidden email]
https://lists.sourceforge.net/lists/listinfo/gramps-users


------------------------------------------------------------------------------
HPCC Systems Open Source Big Data Platform from LexisNexis Risk Solutions
Find What Matters Most in Your Big Data with HPCC Systems
Open Source. Fast. Scalable. Simple. Ideal for Dirty Data.
Leverages Graph Analysis for Fast Processing & Easy Data Exploration
http://p.sf.net/sfu/hpccsystems
_______________________________________________
Gramps-users mailing list
[hidden email]
https://lists.sourceforge.net/lists/listinfo/gramps-users
Reply | Threaded
Open this post in threaded view
|

Re: Raw export format?

Ron Johnson
In reply to this post by Ron Johnson
On 06/15/2014 09:53 PM, Doug Blank wrote:
On Sun, Jun 15, 2014 at 1:39 PM, Ron Johnson <[hidden email]> wrote:
[snip]
DB as "BLOB storage device?  Interesting...

Yes, in fact the Django databases uses the blob formats to keep speed up and keep compatibility. Knowing that we use the database mostly for these blobs, it should be fairly easy to swap out bsddb at some point, even if we use a relational database to store blobs.

So, no querying with sqlite3 from the bash prompt...  :(

 
> The Gramps XML is more human-readable and should be used for backups.

I do.  My curiosity was just piqued by the format, though.

BTW, one could also export in JSON too by calling .to_struct() on these objects as you write them out. That would also be an interesting format between raw and XML.

What's the size comparison of a JSON dump vs. XML?  If it's noticeably smaller, and an importer is easy to write, would it be a better backup format?
-- 
My word, man!  Don't you know your quantum statistics?

------------------------------------------------------------------------------
HPCC Systems Open Source Big Data Platform from LexisNexis Risk Solutions
Find What Matters Most in Your Big Data with HPCC Systems
Open Source. Fast. Scalable. Simple. Ideal for Dirty Data.
Leverages Graph Analysis for Fast Processing & Easy Data Exploration
http://p.sf.net/sfu/hpccsystems
_______________________________________________
Gramps-users mailing list
[hidden email]
https://lists.sourceforge.net/lists/listinfo/gramps-users
Reply | Threaded
Open this post in threaded view
|

Re: Raw export format?

DS Blank
On Mon, Jun 16, 2014 at 12:24 AM, Ron Johnson <[hidden email]> wrote:
On 06/15/2014 09:53 PM, Doug Blank wrote:
On Sun, Jun 15, 2014 at 1:39 PM, Ron Johnson <[hidden email]> wrote:
[snip]
DB as "BLOB storage device?  Interesting...

Yes, in fact the Django databases uses the blob formats to keep speed up and keep compatibility. Knowing that we use the database mostly for these blobs, it should be fairly easy to swap out bsddb at some point, even if we use a relational database to store blobs.

So, no querying with sqlite3 from the bash prompt...  :(

Yes, you can. The webapp keeps the pickled blob, but also stores the data relationally. Its the best of both, even if it uses more space.

 
> The Gramps XML is more human-readable and should be used for backups.

I do.  My curiosity was just piqued by the format, though.

BTW, one could also export in JSON too by calling .to_struct() on these objects as you write them out. That would also be an interesting format between raw and XML.

What's the size comparison of a JSON dump vs. XML?  If it's noticeably smaller, and an importer is easy to write, would it be a better backup format?

The JSON export doesn't exist yet, and neither does the import. Uncompressed JSON would probably be smaller than uncompressed XML---who knows how well it would compress. But the differences in sizes between it and XML would be negligible. The important aspect of a backup is: can you restore from it over time? 

-Doug
 

-- 
My word, man!  Don't you know your quantum statistics?

------------------------------------------------------------------------------
HPCC Systems Open Source Big Data Platform from LexisNexis Risk Solutions
Find What Matters Most in Your Big Data with HPCC Systems
Open Source. Fast. Scalable. Simple. Ideal for Dirty Data.
Leverages Graph Analysis for Fast Processing & Easy Data Exploration
http://p.sf.net/sfu/hpccsystems
_______________________________________________
Gramps-users mailing list
[hidden email]
https://lists.sourceforge.net/lists/listinfo/gramps-users



------------------------------------------------------------------------------
HPCC Systems Open Source Big Data Platform from LexisNexis Risk Solutions
Find What Matters Most in Your Big Data with HPCC Systems
Open Source. Fast. Scalable. Simple. Ideal for Dirty Data.
Leverages Graph Analysis for Fast Processing & Easy Data Exploration
http://p.sf.net/sfu/hpccsystems
_______________________________________________
Gramps-users mailing list
[hidden email]
https://lists.sourceforge.net/lists/listinfo/gramps-users