Re : Gramps and Ancestry hints - a proof of concept

classic Classic list List threaded Threaded
1 message Options
Reply | Threaded
Open this post in threaded view
|

Re : Gramps and Ancestry hints - a proof of concept

jerome
Hi,


Some bug reports and feature requests have been filled around Ancestry stuff.
(#8332 #9249)

Reporters generated complete and clear reports, but this could lead to additional
issues (#1727 #8191). For now, we may find hack (#6941) or quick warning (#9298)
and we might generate some new sections via Gedcom extensions addon[1].

I do not use Ancestry myself and do not want to break current gedcom file format support.
So, I do not know what should be done, because it sounds like a game for Ancestry!
(#9298~46910). Also, gedcom file format is not my favourite playground.

If any expert on "Ancestry-Gramps" round-trip, or a gedcom wizard, or anyone else,
could test or review the patch on #9249 too, this could be great.


[1] https://gramps-project.org/wiki/index.php?title=GEDCOM_Extensions


Thanks!
Jérôme


--------------------------------------------
En date de : Mar 15.3.16, Tom Samstag <[hidden email]> a écrit :

 Objet: [Gramps-devel] Gramps and Ancestry hints - a proof of concept
 À: [hidden email]
 Date: Mardi 15 mars 2016, 21h52
 
 The story:
 I want to utilize Ancestry hints in my research. My research
 is relatively new and incomplete, and
 there are likely many records I haven't yet cited. I use
 Gramps as my primary tree management
 program. In using Ancestry hints, I'd like to be guided
 toward records I haven't yet used, but I
 have no interest in managing a tree on Ancestry. Any new
 records that Ancestry helps me to find, I
 will attach to my tree in Gramps in the same way I've been.
 So any data transfer between Gramps and
 Ancestry need only be one-way; I don't need to round-trip
 any data out of Ancestry. I'd just like
 Ancestry to be able to analyze my data and guide me, not
 become my platform.
 
 The problem:
 So I can export a gedcom from Gramps and upload it to
 Ancestry. Doing so will give me hints, but
 most of those hints will be noise. That's because Ancestry
 has no way of knowing that the citation I
 have to a source titled "United States Census, 1920" is the
 same as a specific record in their
 database. So most of the hints that Ancestry will give will
 be duplicates of what I already have.
 
 Ancestry can get that information through a gedcom upload
 though. It uses a proprietary tag _APID
 that references the database id and the record id. So if we
 could somehow enter that data into
 Gramps and get it to export it into gedcom, we'd be good.
 
 So one problem is where in the Gramps object hierarchy to
 store that info. The way that the
 information is traditionally organized, (e.g. through the
 census/forms addons) is n people in the
 same event, that event having one citation, to the specific
 source. The APID value is distinct for
 each person in the event, but each person can be in multiple
 cited events.
 
 The proof of concept:
 So I've created a proof of concept to tackle this problem.
 It consists of some changes to the gedcom
 exporter, and an optional gramplet to make data entry
 easier.
 
 Each Gramps source should map to a given Ancestry database
 ID. So each source has an attribute with
 that ID.
 
 Then, each person has attributes that reference their record
 number for a given database ID. For
 example (attribute names are likely to change):
 
 =============================
 Source: US Census, 1920
    attribute: Ancestry DBID = 6061
 
 Citation: Pennsylvania, Allegheny County, Pittsburgh City,
 Pittsburgh, Ward 18, sheet 10A, family
 227, Henry Watzlaf household
 
 Event: Census event
 
 Person: Henry George Watzlaf
    attribute: Ancestry APID H:6061 =
 49733277
 =============================
 
 Henry George Watzlaf will have other similar attributes for
 his records in other databases, and the
 other people that appear in the 1920 census will have an
 attribute of "Ancestry APID H:6061" with
 different values.
 
 This will result in the CENS census gedcom record having a
 SOUR source record which contains a line
 of _APID 1,6061::49733277. When this gedcom is used to
 create a tree on Ancestry, it will be a
 reference to the Ancestry record at
 http://search.ancestry.com/cgi-bin/sse.dll?indiv=1&dbid=6061&h=49733277
 and the hint will no longer
 be given since Ancestry will understand that I've already
 cited that record.
 
 My gramplet will, for the active citation (if its source has
 the attribute), enumerate events cited
 and list the people. It sorts them according to the order
 attribute used by the census/forms addon.
 Each person can have a record ID entered. For convenience,
 you can also paste in a URL to the
 record, and it pulls out the record id (from the "h"
 parameter).
 
 The takeaways:
 So I've been going through the process of generating a
 gedcom, uploading it to Ancestry, going
 through the hints to attribute my citations, then deleting
 the Ancestry tree and repeating. In doing
 so, using only hints (not actively finding records on
 Ancestry) it's succeeded at suggesting the
 majority of records in my test set of sources [1]. After
 these attributes, remaining hints are
 either false positives or genuinely records that I didn't
 yet have cited in my tree.
 
 One issue I've realized exists is if the same person appears
 in multiple records within the same
 database. For instance, if the same person is in a marriage
 license as a groom and in another as the
 father of the bride. I haven't tested this yet, I think the
 back-end will work, but there isn't
 enough information for the UI to be well behaved.
 
 So if you've read this far, thanks! I'd like any feedback
 you may have to offer. I'll package up the
 code later tonight, but as another warning, it's still very
 much first-attempt quality.
 
 [1] US census sources, PA birth and death certificates,
 United States Social Security Death Index
 
 ------------------------------------------------------------------------------
 Transform Data into Opportunity.
 Accelerate data analysis in your applications with
 Intel Data Analytics Acceleration Library.
 Click to learn more.
 http://pubads.g.doubleclick.net/gampad/clk?id=278785231&iu=/4140
 _______________________________________________
 Gramps-devel mailing list
 [hidden email]
 https://lists.sourceforge.net/lists/listinfo/gramps-devel

------------------------------------------------------------------------------
Transform Data into Opportunity.
Accelerate data analysis in your applications with
Intel Data Analytics Acceleration Library.
Click to learn more.
http://pubads.g.doubleclick.net/gampad/clk?id=278785231&iu=/4140
_______________________________________________
Gramps-devel mailing list
[hidden email]
https://lists.sourceforge.net/lists/listinfo/gramps-devel