How bad is my data?

classic Classic list List threaded Threaded
14 messages Options
Reply | Threaded
Open this post in threaded view
|

How bad is my data?

Rich Lakey
I'm just venting for a moment then going for a walk before trying to Analise my problem.

Just ran a Family sheet report for an individual to see what the report looks like. Noticed for the person a source reference that said "Death Certificate for Caroline Niehaus" But this was not Caroline Niehaus the report was for.  Looked in the Citations and found about thirty citations that said death certificate for Caroline Niehaus. None of them referenced Caroline Niehaus but other people. Then I noticed other people that had duplicates that did not reference them.  I looked at last change date for these citations and they were modified   February 3, 2017 at 9:28 AM. In fact I would guess of over 10,000 citations, 9,000 were last changed at this date and time.  I am so sick I can't even look at this anymore for now.  It would seem all the work I have done since then is good, most of this is citations that came with my gedcom years ago.  I can't imagine what this will take. I have backups, but do I want to go back a year and a half?
My thought is do nothing till I understand the scope of the problem.  With 16,000 plus people I have probably not looked at most of these. And the error is not obvious. Looking at the person entry that the above error was noted, I found it as a citation for the persons birth. ARG.
Rich


------------------------------------------------------------------------------
Check out the vibrant tech community on one of the world's most
engaging tech sites, Slashdot.org! http://sdm.link/slashdot
_______________________________________________
Gramps-users mailing list
[hidden email]
https://lists.sourceforge.net/lists/listinfo/gramps-users
https://gramps-project.org
Reply | Threaded
Open this post in threaded view
|

Re: How bad is my data?

adrian.davey
I definitely agree with the proposition of doing nothing at least until
you have gone for that long walk, and I doubt that even a total disaster
would persuade me to go back as far as 18 months!

I take it that you think you have ~9000 rogue citations, but they do not
all reference a single source. If I interpret you correctly you
have—say—30 rogue citations on average for each of ~300 sources.

One option, in citation view, is to filter on that last changed date. Then:

sort by source ID
examine first source ID group
work out which (if any) of the citations is validly referenced
select all of the citations for that ID that are not valid
delete [remove] all those citations
move to next source ID group
repeat, etc

still potentially or actually tedious, I know, [especially as,
unfortunately, citation view does not give the option of a column
displaying the referencing event] but if your 30x stat is a good guide,
at least only ~1/30 of working through one by one

If you do happen to have 9000 rogue citations to a single source, this
method, applied just once, would definitely work very quickly, and even
if you cannot work out the single instance that needs to be retained or
recreated I would happily live with that in this situation! [If only I
could be confident there is only one citation missing the my entire db!]

Adrian

On 2018-05-17 11:38, Rich wrote:

> I'm just venting for a moment then going for a walk before trying to
> Analise my problem.
>
> Just ran a Family sheet report for an individual to see what the
> report looks like. Noticed for the person a source reference that said
> "Death Certificate for Caroline Niehaus" But this was not Caroline
> Niehaus the report was for.  Looked in the Citations and found about
> thirty citations that said death certificate for Caroline Niehaus.
> None of them referenced Caroline Niehaus but other people. Then I
> noticed other people that had duplicates that did not reference them.
> I looked at last change date for these citations and they were
> modified   February 3, 2017 at 9:28 AM. In fact I would guess of over
> 10,000 citations, 9,000 were last changed at this date and time.  I am
> so sick I can't even look at this anymore for now.  It would seem all
> the work I have done since then is good, most of this is citations
> that came with my gedcom years ago.  I can't imagine what this will
> take. I have backups, but do I want to go back a year and a half?
> My thought is do nothing till I understand the scope of the problem.  
> With 16,000 plus people I have probably not looked at most of these.
> And the error is not obvious. Looking at the person entry that the
> above error was noted, I found it as a citation for the persons birth.
> ARG.
> Rich
>
>
>
> ------------------------------------------------------------------------------
> Check out the vibrant tech community on one of the world's most
> engaging tech sites, Slashdot.org! http://sdm.link/slashdot
>
>
> _______________________________________________
> Gramps-users mailing list
> [hidden email]
> https://lists.sourceforge.net/lists/listinfo/gramps-users
> https://gramps-project.org


------------------------------------------------------------------------------
Check out the vibrant tech community on one of the world's most
engaging tech sites, Slashdot.org! http://sdm.link/slashdot
_______________________________________________
Gramps-users mailing list
[hidden email]
https://lists.sourceforge.net/lists/listinfo/gramps-users
https://gramps-project.org
Reply | Threaded
Open this post in threaded view
|

Re: How bad is my data?

Rich Lakey
In reply to this post by Rich Lakey
Hopefully this was a false alarm. Something I knew I needed to straiten out someday but didn't know it was this bad. I believe these citations were in my original GEDCOM I got from my dad that came from his FTM in 2010.  The date was not when last changed but the last time imported.   Dad, bless his heart didn't do sources and citations well. They were no more than notes with nothing referenced, no image attached etc. I knew years ago I needed to clean them up some day. What I didn't realize was some of them point to wrong people and were duplicates.
Loading my oldest backup from early December 2016 it has the same problem. I am going to take sometime tomorrow to make sure the ones I have added are ok.

On 05/16/2018 08:38 PM, Rich wrote:
I'm just venting for a moment then going for a walk before trying to Analise my problem.

Just ran a Family sheet report for an individual to see what the report looks like. Noticed for the person a source reference that said "Death Certificate for Caroline Niehaus" But this was not Caroline Niehaus the report was for.  Looked in the Citations and found about thirty citations that said death certificate for Caroline Niehaus. None of them referenced Caroline Niehaus but other people. Then I noticed other people that had duplicates that did not reference them.  I looked at last change date for these citations and they were modified   February 3, 2017 at 9:28 AM. In fact I would guess of over 10,000 citations, 9,000 were last changed at this date and time.  I am so sick I can't even look at this anymore for now.  It would seem all the work I have done since then is good, most of this is citations that came with my gedcom years ago.  I can't imagine what this will take. I have backups, but do I want to go back a year and a half?
My thought is do nothing till I understand the scope of the problem.  With 16,000 plus people I have probably not looked at most of these. And the error is not obvious. Looking at the person entry that the above error was noted, I found it as a citation for the persons birth. ARG.
Rich



------------------------------------------------------------------------------
Check out the vibrant tech community on one of the world's most
engaging tech sites, Slashdot.org! http://sdm.link/slashdot


_______________________________________________
Gramps-users mailing list
[hidden email]
https://lists.sourceforge.net/lists/listinfo/gramps-users
https://gramps-project.org


------------------------------------------------------------------------------
Check out the vibrant tech community on one of the world's most
engaging tech sites, Slashdot.org! http://sdm.link/slashdot
_______________________________________________
Gramps-users mailing list
[hidden email]
https://lists.sourceforge.net/lists/listinfo/gramps-users
https://gramps-project.org
Reply | Threaded
Open this post in threaded view
|

Re: How bad is my data?

Rich Lakey
In reply to this post by adrian.davey
I would agree I need to get ride of invalid or empty source and citations and not put it off till I forget about it.  I may get back about the filter after I see the problem more clearly.    

On 05/16/2018 09:21 PM, Adrian Davey wrote:
I definitely agree with the proposition of doing nothing at least until you have gone for that long walk, and I doubt that even a total disaster would persuade me to go back as far as 18 months!

I take it that you think you have ~9000 rogue citations, but they do not all reference a single source. If I interpret you correctly you have—say—30 rogue citations on average for each of ~300 sources.

One option, in citation view, is to filter on that last changed date. Then:

sort by source ID
examine first source ID group
work out which (if any) of the citations is validly referenced
select all of the citations for that ID that are not valid
delete [remove] all those citations
move to next source ID group
repeat, etc

still potentially or actually tedious, I know, [especially as, unfortunately, citation view does not give the option of a column displaying the referencing event] but if your 30x stat is a good guide, at least only ~1/30 of working through one by one

If you do happen to have 9000 rogue citations to a single source, this method, applied just once, would definitely work very quickly, and even if you cannot work out the single instance that needs to be retained or recreated I would happily live with that in this situation! [If only I could be confident there is only one citation missing the my entire db!]

Adrian

On 2018-05-17 11:38, Rich wrote:
I'm just venting for a moment then going for a walk before trying to Analise my problem.

Just ran a Family sheet report for an individual to see what the report looks like. Noticed for the person a source reference that said "Death Certificate for Caroline Niehaus" But this was not Caroline Niehaus the report was for.  Looked in the Citations and found about thirty citations that said death certificate for Caroline Niehaus. None of them referenced Caroline Niehaus but other people. Then I noticed other people that had duplicates that did not reference them. I looked at last change date for these citations and they were modified   February 3, 2017 at 9:28 AM. In fact I would guess of over 10,000 citations, 9,000 were last changed at this date and time.  I am so sick I can't even look at this anymore for now.  It would seem all the work I have done since then is good, most of this is citations that came with my gedcom years ago.  I can't imagine what this will take. I have backups, but do I want to go back a year and a half?
My thought is do nothing till I understand the scope of the problem.  With 16,000 plus people I have probably not looked at most of these. And the error is not obvious. Looking at the person entry that the above error was noted, I found it as a citation for the persons birth. ARG.
Rich



------------------------------------------------------------------------------
Check out the vibrant tech community on one of the world's most
engaging tech sites, Slashdot.org! http://sdm.link/slashdot


_______________________________________________
Gramps-users mailing list
[hidden email]
https://lists.sourceforge.net/lists/listinfo/gramps-users
https://gramps-project.org


------------------------------------------------------------------------------
Check out the vibrant tech community on one of the world's most
engaging tech sites, Slashdot.org! http://sdm.link/slashdot
_______________________________________________
Gramps-users mailing list
[hidden email]
https://lists.sourceforge.net/lists/listinfo/gramps-users
https://gramps-project.org


------------------------------------------------------------------------------
Check out the vibrant tech community on one of the world's most
engaging tech sites, Slashdot.org! http://sdm.link/slashdot
_______________________________________________
Gramps-users mailing list
[hidden email]
https://lists.sourceforge.net/lists/listinfo/gramps-users
https://gramps-project.org
Reply | Threaded
Open this post in threaded view
|

Re: How bad is my data?

Doug-11
In reply to this post by Rich Lakey
On 17/05/18 03:32, Rich wrote:
Hopefully this was a false alarm. Something I knew I needed to straiten out someday but didn't know it was this bad. I believe these citations were in my original GEDCOM I got from my dad that came from his FTM in 2010.  The date was not when last changed but the last time imported.   Dad, bless his heart didn't do sources and citations well. They were no more than notes with nothing referenced, no image attached etc. I knew years ago I needed to clean them up some day. What I didn't realize was some of them point to wrong people and were duplicates.
Loading my oldest backup from early December 2016 it has the same problem. I am going to take sometime tomorrow to make sure the ones I have added are ok.

On 05/16/2018 08:38 PM, Rich wrote:
I'm just venting for a moment then going for a walk before trying to Analise my problem.

Just ran a Family sheet report for an individual to see what the report looks like. Noticed for the person a source reference that said "Death Certificate for Caroline Niehaus" But this was not Caroline Niehaus the report was for.  Looked in the Citations and found about thirty citations that said death certificate for Caroline Niehaus. None of them referenced Caroline Niehaus but other people. Then I noticed other people that had duplicates that did not reference them.  I looked at last change date for these citations and they were modified   February 3, 2017 at 9:28 AM. In fact I would guess of over 10,000 citations, 9,000 were last changed at this date and time.  I am so sick I can't even look at this anymore for now.  It would seem all the work I have done since then is good, most of this is citations that came with my gedcom years ago.  I can't imagine what this will take. I have backups, but do I want to go back a year and a half?
My thought is do nothing till I understand the scope of the problem.  With 16,000 plus people I have probably not looked at most of these. And the error is not obvious. Looking at the person entry that the above error was noted, I found it as a citation for the persons birth. ARG.
Rich



You have my deep sympathy. I've recently discovered a screw-up in my database with reference regions in images and relationships of people in families getting garbled and swapped around.
Haven't yet got the courage to tackle it.

Doug

------------------------------------------------------------------------------
Check out the vibrant tech community on one of the world's most
engaging tech sites, Slashdot.org! http://sdm.link/slashdot
_______________________________________________
Gramps-users mailing list
[hidden email]
https://lists.sourceforge.net/lists/listinfo/gramps-users
https://gramps-project.org
Reply | Threaded
Open this post in threaded view
|

Re: How bad is my data?

prculley
While it looks like you have found your issue at this point, I would like to point out a couple of tools which can help to analyze this kind of thing in the future. 

The first tool is the database differences report. That tool can be used to compare an old version from backup with the current version of your database and report on anything that's changed.

The second tool is the import and merge tool. It is only available for the upcoming Gramps 5.0.0.  It will also show the differences between two related databases such as from backup in your case. However, it will also allow you to decide what to do with each difference, that is, keep it or discard it.

Both tools are only useful if working with related databases, that is one from an older backup of the other, or an edited version of the other. If the databases are not related, IE they came from two different Gedcoms, the tools are useless, since they will tell you that everything is different.

Paul Culley 

On Thu, May 17, 2018, 7:04 AM Doug <[hidden email]> wrote:
On 17/05/18 03:32, Rich wrote:
Hopefully this was a false alarm. Something I knew I needed to straiten out someday but didn't know it was this bad. I believe these citations were in my original GEDCOM I got from my dad that came from his FTM in 2010.  The date was not when last changed but the last time imported.   Dad, bless his heart didn't do sources and citations well. They were no more than notes with nothing referenced, no image attached etc. I knew years ago I needed to clean them up some day. What I didn't realize was some of them point to wrong people and were duplicates.
Loading my oldest backup from early December 2016 it has the same problem. I am going to take sometime tomorrow to make sure the ones I have added are ok.

On 05/16/2018 08:38 PM, Rich wrote:
I'm just venting for a moment then going for a walk before trying to Analise my problem.

Just ran a Family sheet report for an individual to see what the report looks like. Noticed for the person a source reference that said "Death Certificate for Caroline Niehaus" But this was not Caroline Niehaus the report was for.  Looked in the Citations and found about thirty citations that said death certificate for Caroline Niehaus. None of them referenced Caroline Niehaus but other people. Then I noticed other people that had duplicates that did not reference them.  I looked at last change date for these citations and they were modified   February 3, 2017 at 9:28 AM. In fact I would guess of over 10,000 citations, 9,000 were last changed at this date and time.  I am so sick I can't even look at this anymore for now.  It would seem all the work I have done since then is good, most of this is citations that came with my gedcom years ago.  I can't imagine what this will take. I have backups, but do I want to go back a year and a half?
My thought is do nothing till I understand the scope of the problem.  With 16,000 plus people I have probably not looked at most of these. And the error is not obvious. Looking at the person entry that the above error was noted, I found it as a citation for the persons birth. ARG.
Rich



You have my deep sympathy. I've recently discovered a screw-up in my database with reference regions in images and relationships of people in families getting garbled and swapped around.
Haven't yet got the courage to tackle it.

Doug
------------------------------------------------------------------------------
Check out the vibrant tech community on one of the world's most
engaging tech sites, Slashdot.org! http://sdm.link/slashdot_______________________________________________
Gramps-users mailing list
[hidden email]
https://lists.sourceforge.net/lists/listinfo/gramps-users
https://gramps-project.org

------------------------------------------------------------------------------
Check out the vibrant tech community on one of the world's most
engaging tech sites, Slashdot.org! http://sdm.link/slashdot
_______________________________________________
Gramps-users mailing list
[hidden email]
https://lists.sourceforge.net/lists/listinfo/gramps-users
https://gramps-project.org
Reply | Threaded
Open this post in threaded view
|

Re: How bad is my data?

Michael Tennant

I came across this, which is not a solution but might help towards an understanding of the problem Rich is facing:

GEDCOM is a data format for genealogical data. GEDCOM allows transferring data from one genealogy application to another, but because of inherent GEDCOM limitations, incomplete specifications, unsupported dialects and poor implementations, that transfer may be less than perfect. On top of that, many applications do not even provide an import log to help you figure out how well the import went.

GEDCOM is not perfect, and not perfectly supported either, but it is the only widely supported standard.
In practice, basic data such as names and vital events transfers just fine, and that is already a large improvement on a world without any standard for genealogy data. A lot of other data such as notes and sources generally transfers successfully as well. Moreover, GEDCOM dialects of popular products tend to be supported by many other products.

Vendors tend to stress the ability of their product to import GEDCOM files created by competing products, but to a user, the more important thing is the quality of the GEDCOM files it exports, as that largely determines the ability of other products to import those GEDCOM files. Only when other applications will import the file can you use a GEDCOM file to do what it was designed to; move your data from one application to another.


I fear this says it all:  what is exported from one application and what is imported by another application (in this case Gramps) may be far removed from one another.  I think the biggest issue with Gramps is how it differentiates cleanly between what we think of as a SOURCE and a CITATION: other applications are a bit more flakey on this distinction, hence the errors.  GEDCOMs do transfer basic personal data very well, but the rest needs to be checked very carefully.  Something for the long winter evenings, Rich!


Mike



On 17 May 2018 at 13:57, Paul Culley <[hidden email]> wrote:
While it looks like you have found your issue at this point, I would like to point out a couple of tools which can help to analyze this kind of thing in the future. 

The first tool is the database differences report. That tool can be used to compare an old version from backup with the current version of your database and report on anything that's changed.

The second tool is the import and merge tool. It is only available for the upcoming Gramps 5.0.0.  It will also show the differences between two related databases such as from backup in your case. However, it will also allow you to decide what to do with each difference, that is, keep it or discard it.

Both tools are only useful if working with related databases, that is one from an older backup of the other, or an edited version of the other. If the databases are not related, IE they came from two different Gedcoms, the tools are useless, since they will tell you that everything is different.

Paul Culley 

On Thu, May 17, 2018, 7:04 AM Doug <[hidden email]> wrote:
On 17/05/18 03:32, Rich wrote:
Hopefully this was a false alarm. Something I knew I needed to straiten out someday but didn't know it was this bad. I believe these citations were in my original GEDCOM I got from my dad that came from his FTM in 2010.  The date was not when last changed but the last time imported.   Dad, bless his heart didn't do sources and citations well. They were no more than notes with nothing referenced, no image attached etc. I knew years ago I needed to clean them up some day. What I didn't realize was some of them point to wrong people and were duplicates.
Loading my oldest backup from early December 2016 it has the same problem. I am going to take sometime tomorrow to make sure the ones I have added are ok.

On 05/16/2018 08:38 PM, Rich wrote:
I'm just venting for a moment then going for a walk before trying to Analise my problem.

Just ran a Family sheet report for an individual to see what the report looks like. Noticed for the person a source reference that said "Death Certificate for Caroline Niehaus" But this was not Caroline Niehaus the report was for.  Looked in the Citations and found about thirty citations that said death certificate for Caroline Niehaus. None of them referenced Caroline Niehaus but other people. Then I noticed other people that had duplicates that did not reference them.  I looked at last change date for these citations and they were modified   February 3, 2017 at 9:28 AM. In fact I would guess of over 10,000 citations, 9,000 were last changed at this date and time.  I am so sick I can't even look at this anymore for now.  It would seem all the work I have done since then is good, most of this is citations that came with my gedcom years ago.  I can't imagine what this will take. I have backups, but do I want to go back a year and a half?
My thought is do nothing till I understand the scope of the problem.  With 16,000 plus people I have probably not looked at most of these. And the error is not obvious. Looking at the person entry that the above error was noted, I found it as a citation for the persons birth. ARG.
Rich



You have my deep sympathy. I've recently discovered a screw-up in my database with reference regions in images and relationships of people in families getting garbled and swapped around.
Haven't yet got the courage to tackle it.

Doug
------------------------------------------------------------------------------
Check out the vibrant tech community on one of the world's most
engaging tech sites, Slashdot.org! http://sdm.link/slashdot_______________________________________________
Gramps-users mailing list
[hidden email]
https://lists.sourceforge.net/lists/listinfo/gramps-users
https://gramps-project.org

------------------------------------------------------------------------------
Check out the vibrant tech community on one of the world's most
engaging tech sites, Slashdot.org! http://sdm.link/slashdot
_______________________________________________
Gramps-users mailing list
[hidden email]
https://lists.sourceforge.net/lists/listinfo/gramps-users
https://gramps-project.org


------------------------------------------------------------------------------
Check out the vibrant tech community on one of the world's most
engaging tech sites, Slashdot.org! http://sdm.link/slashdot
_______________________________________________
Gramps-users mailing list
[hidden email]
https://lists.sourceforge.net/lists/listinfo/gramps-users
https://gramps-project.org
Reply | Threaded
Open this post in threaded view
|

Re: How bad is my data?

Rich Lakey
I'm making some progress. Of my 10,700 citations I thought perhaps 9,000 were bad. Now I am thinking more like 6,000 to 7,000 bad. But this morning I removed over 1,000 and also cut the grass. So it might not take more than a couple of days.  Not all the citations with a date of Feb 3, 2017 are bad, I created many of them.
So far I have found sources with 4 to 283 citations attached. 50 to 100 seems about norm.  These citations have a reference to a person but no event. So looking at the person I would never have seem the bad citation. But on rare occasion the reference also has an event. Then I will see the citation in the event for that person. I have not found a citation yet that has valid data.
I know myself sources and citations are not easy to understand.  And you need to understand where my father was coming from.  He started genealogy research before 1980. He got his first computer in 1982. I think it was sometime after that before real research could be done on the Internet. He got much of his information by sending off a snailmail request with a stamped return envelope. I'm still amazed by the images of census records, ships logs etc. that I can get from Ancestry.  And I have tubs of his documents that include birth and death certificates he sent off for that I need to go through.
So when dad created a citation all it has is the title. The title might be reference to a book, a persons name and email,  or a web address that is no longer valid. Doesn't tell me what the citation was for. Perhaps this data was lost in the GEDCOM. I have his files and his laptop so I could check this out further.  I'm 75 and have at least 50 more years of work that I know of. My dad passed away late 2016 at 100, still using his laptop for research to the end.
We need to keep this in mind when we created a citation. I try to attach in the citations gallery any image I can get, birth certificate, census image, etc. and if I can't get the image get the text into an attached note with where the text came from.
Rich


On 05/17/2018 10:30 AM, Michael Tennant wrote:

I came across this, which is not a solution but might help towards an understanding of the problem Rich is facing:

GEDCOM is a data format for genealogical data. GEDCOM allows transferring data from one genealogy application to another, but because of inherent GEDCOM limitations, incomplete specifications, unsupported dialects and poor implementations, that transfer may be less than perfect. On top of that, many applications do not even provide an import log to help you figure out how well the import went.

GEDCOM is not perfect, and not perfectly supported either, but it is the only widely supported standard.
In practice, basic data such as names and vital events transfers just fine, and that is already a large improvement on a world without any standard for genealogy data. A lot of other data such as notes and sources generally transfers successfully as well. Moreover, GEDCOM dialects of popular products tend to be supported by many other products.

Vendors tend to stress the ability of their product to import GEDCOM files created by competing products, but to a user, the more important thing is the quality of the GEDCOM files it exports, as that largely determines the ability of other products to import those GEDCOM files. Only when other applications will import the file can you use a GEDCOM file to do what it was designed to; move your data from one application to another.


I fear this says it all:  what is exported from one application and what is imported by another application (in this case Gramps) may be far removed from one another.  I think the biggest issue with Gramps is how it differentiates cleanly between what we think of as a SOURCE and a CITATION: other applications are a bit more flakey on this distinction, hence the errors.  GEDCOMs do transfer basic personal data very well, but the rest needs to be checked very carefully.  Something for the long winter evenings, Rich!


Mike



On 17 May 2018 at 13:57, Paul Culley <[hidden email]> wrote:
While it looks like you have found your issue at this point, I would like to point out a couple of tools which can help to analyze this kind of thing in the future. 

The first tool is the database differences report. That tool can be used to compare an old version from backup with the current version of your database and report on anything that's changed.

The second tool is the import and merge tool. It is only available for the upcoming Gramps 5.0.0.  It will also show the differences between two related databases such as from backup in your case. However, it will also allow you to decide what to do with each difference, that is, keep it or discard it.

Both tools are only useful if working with related databases, that is one from an older backup of the other, or an edited version of the other. If the databases are not related, IE they came from two different Gedcoms, the tools are useless, since they will tell you that everything is different.

Paul Culley 

On Thu, May 17, 2018, 7:04 AM Doug <[hidden email]> wrote:
On 17/05/18 03:32, Rich wrote:
Hopefully this was a false alarm. Something I knew I needed to straiten out someday but didn't know it was this bad. I believe these citations were in my original GEDCOM I got from my dad that came from his FTM in 2010.  The date was not when last changed but the last time imported.   Dad, bless his heart didn't do sources and citations well. They were no more than notes with nothing referenced, no image attached etc. I knew years ago I needed to clean them up some day. What I didn't realize was some of them point to wrong people and were duplicates.
Loading my oldest backup from early December 2016 it has the same problem. I am going to take sometime tomorrow to make sure the ones I have added are ok.

On 05/16/2018 08:38 PM, Rich wrote:
I'm just venting for a moment then going for a walk before trying to Analise my problem.

Just ran a Family sheet report for an individual to see what the report looks like. Noticed for the person a source reference that said "Death Certificate for Caroline Niehaus" But this was not Caroline Niehaus the report was for.  Looked in the Citations and found about thirty citations that said death certificate for Caroline Niehaus. None of them referenced Caroline Niehaus but other people. Then I noticed other people that had duplicates that did not reference them.  I looked at last change date for these citations and they were modified   February 3, 2017 at 9:28 AM. In fact I would guess of over 10,000 citations, 9,000 were last changed at this date and time.  I am so sick I can't even look at this anymore for now.  It would seem all the work I have done since then is good, most of this is citations that came with my gedcom years ago.  I can't imagine what this will take. I have backups, but do I want to go back a year and a half?
My thought is do nothing till I understand the scope of the problem.  With 16,000 plus people I have probably not looked at most of these. And the error is not obvious. Looking at the person entry that the above error was noted, I found it as a citation for the persons birth. ARG.
Rich



You have my deep sympathy. I've recently discovered a screw-up in my database with reference regions in images and relationships of people in families getting garbled and swapped around.
Haven't yet got the courage to tackle it.

Doug
------------------------------------------------------------------------------
Check out the vibrant tech community on one of the world's most
engaging tech sites, Slashdot.org! http://sdm.link/slashdot_______________________________________________
Gramps-users mailing list
[hidden email]
https://lists.sourceforge.net/lists/listinfo/gramps-users
https://gramps-project.org

------------------------------------------------------------------------------
Check out the vibrant tech community on one of the world's most
engaging tech sites, Slashdot.org! http://sdm.link/slashdot
_______________________________________________
Gramps-users mailing list
[hidden email]
https://lists.sourceforge.net/lists/listinfo/gramps-users
https://gramps-project.org



------------------------------------------------------------------------------
Check out the vibrant tech community on one of the world's most
engaging tech sites, Slashdot.org! http://sdm.link/slashdot


_______________________________________________
Gramps-users mailing list
[hidden email]
https://lists.sourceforge.net/lists/listinfo/gramps-users
https://gramps-project.org


------------------------------------------------------------------------------
Check out the vibrant tech community on one of the world's most
engaging tech sites, Slashdot.org! http://sdm.link/slashdot
_______________________________________________
Gramps-users mailing list
[hidden email]
https://lists.sourceforge.net/lists/listinfo/gramps-users
https://gramps-project.org
Reply | Threaded
Open this post in threaded view
|

Old laptops (was Re: How bad is my data?)

Ron Johnson
In reply to this post by Michael Tennant
On 05/17/2018 01:30 PM, Rich wrote:
[snip]
> My dad passed away late 2016 at 100, still using his laptop for research
> to the end.
> We need to keep this in mind when we created a citation. I try to attach
> in the citations gallery any image I can get, birth certificate, census
> image, etc. and if I can't get the image get the text into an attached
> note with where the text came from.

I don't know how it works on Windows, but I'd:
1. look for "virtualization" software,
2. make an "image" of that old laptop's disk, and
3. use the virtualization software to emulate the old laptop on your
(presumably newer laptop or desktop).

A benefit of this is that you can then move that "disk image" to any new
computer you get and even pass it on to whoever takes the mantle from you.

--
Angular momentum makes the world go 'round.

------------------------------------------------------------------------------
Check out the vibrant tech community on one of the world's most
engaging tech sites, Slashdot.org! http://sdm.link/slashdot
_______________________________________________
Gramps-users mailing list
[hidden email]
https://lists.sourceforge.net/lists/listinfo/gramps-users
https://gramps-project.org
Reply | Threaded
Open this post in threaded view
|

Re: Old laptops (was Re: How bad is my data?)

Rich Lakey
Thanks Ron,  But I remembered I have the old version 11 of FTM that dad used on Windows 7 on my machine running under WINE.  On a brief look last night I found citations on that old FTM confusing. Not really a citation or a source but sort of a combination.  It also only allows me to see the entry for that person not a list of all citations. I have brought his machine up and hope to look closer this weekend.  Although it is about 8 years old it isn't to slow. I have all his files on my machine for everything. Never know when a computer will give it up.
I've been through over 3,000 citations. Some don't even have a reference.
Rich 


On 05/18/2018 06:10 PM, Ron Johnson wrote:
On 05/17/2018 01:30 PM, Rich wrote:
[snip]
My dad passed away late 2016 at 100, still using his laptop for research to the end.
We need to keep this in mind when we created a citation. I try to attach in the citations gallery any image I can get, birth certificate, census image, etc. and if I can't get the image get the text into an attached note with where the text came from.

I don't know how it works on Windows, but I'd:
1. look for "virtualization" software,
2. make an "image" of that old laptop's disk, and
3. use the virtualization software to emulate the old laptop on your (presumably newer laptop or desktop).

A benefit of this is that you can then move that "disk image" to any new computer you get and even pass it on to whoever takes the mantle from you.



------------------------------------------------------------------------------
Check out the vibrant tech community on one of the world's most
engaging tech sites, Slashdot.org! http://sdm.link/slashdot
_______________________________________________
Gramps-users mailing list
[hidden email]
https://lists.sourceforge.net/lists/listinfo/gramps-users
https://gramps-project.org
Reply | Threaded
Open this post in threaded view
|

Re: Old laptops (was Re: How bad is my data?)

enno

Hello Rich,

Thanks Ron,  But I remembered I have the old version 11 of FTM that dad used on Windows 7 on my machine running under WINE.  On a brief look last night I found citations on that old FTM confusing. Not really a citation or a source but sort of a combination.  It also only allows me to see the entry for that person not a list of all citations. I have brought his machine up and hope to look closer this weekend.  Although it is about 8 years old it isn't to slow. I have all his files on my machine for everything. Never know when a computer will give it up.
I've been through over 3,000 citations. Some don't even have a reference.
Would you suspect that your citations were messed up by Gramps? You didn't directly suggest such a thing, but I like to know if you do, because then, other users can benefit from this too.

One of the reasons for asking is that you reported that lots of citations had the same date of last change, and that date may be set when you run check & repair on your database, but also when you let Gramps search for identical ones, and merge those.

When citations have no date or page of their own, and they're linked to the same source, Gramps will see those as identical, even when some citations have notes. You can tick a box to prevent this, but if you don't, you may very well end up with something that you don't really want.

Also, please note that sources and citations are shared, so that, when you change the source title for one person, all other citations referring to that source, will also show the new title. This is something that you might not expect, when you rewrite a source title for a single event. If this was shared before, say because lots events were documented in a single family bible, changing the title for one to that of a particular record, will mess up a lot.

This sharing also applies to places, so you should also never change a place name when you discover that an event was registered with the wrong place. You should unlink the place from the event instead, and then link the event to another place, which you may create on the fly.

This place thing looks very weird if you come from another program, but it makes Gramps very powerful, if you know how to work with it.

Cheers,

Enno


------------------------------------------------------------------------------
Check out the vibrant tech community on one of the world's most
engaging tech sites, Slashdot.org! http://sdm.link/slashdot
_______________________________________________
Gramps-users mailing list
[hidden email]
https://lists.sourceforge.net/lists/listinfo/gramps-users
https://gramps-project.org
Reply | Threaded
Open this post in threaded view
|

Re: Old laptops (was Re: How bad is my data?)

Rich Lakey
I imported this GEDCOM in 2010. So I can't remember anything about the  import. I was a brand new GRAMPS user at the time. The date in the record I believe was the last time I imported a Gramps XML. I found several years ago that Gramps slows down over time and reimporting the data seems to clean things up.   My preliminary look at my dads FTM V11 shows a single citation/source that he only filled out the title with where he got his information.  It might give a persons name and email, or correspondence with so and so. one of the genealogy boards that no longer exist or from one of the FTM disk you could buy with info.  It does not say what the info was. Birth date, marriage data etc. 
But in the import these citations seem replicated. Can be 1 to 288 citations linked to a single source.  Most (maybe 80%) of the citations reference a person and not an event. So they do not show up in the person view attached to anything.  Maybe another 15% of the citations have no reference.  Less than 5% are connected to a persons event.  These are the only ones I had previously been aware of. I would see one once in a while while attaching a citation to an event. They looked a bit strange and non descriptive but didn't seem a problem and I thought I would get to them some day.
Bottom line, I see no problem with Gramps, This is just something I need to wade through.  And as others have mentioned, GEDCOM is not perfect.  I first posted about this as a need to talk to someone and I thought the problem was much much more serious.  One word of wisdom others mentioned, do not trust a GEDCOM import.
Rich

On 05/19/2018 08:55 AM, Enno Borgsteede wrote:

Hello Rich,

Thanks Ron,  But I remembered I have the old version 11 of FTM that dad used on Windows 7 on my machine running under WINE.  On a brief look last night I found citations on that old FTM confusing. Not really a citation or a source but sort of a combination.  It also only allows me to see the entry for that person not a list of all citations. I have brought his machine up and hope to look closer this weekend.  Although it is about 8 years old it isn't to slow. I have all his files on my machine for everything. Never know when a computer will give it up.
I've been through over 3,000 citations. Some don't even have a reference.
Would you suspect that your citations were messed up by Gramps? You didn't directly suggest such a thing, but I like to know if you do, because then, other users can benefit from this too.

One of the reasons for asking is that you reported that lots of citations had the same date of last change, and that date may be set when you run check & repair on your database, but also when you let Gramps search for identical ones, and merge those.

When citations have no date or page of their own, and they're linked to the same source, Gramps will see those as identical, even when some citations have notes. You can tick a box to prevent this, but if you don't, you may very well end up with something that you don't really want.

Also, please note that sources and citations are shared, so that, when you change the source title for one person, all other citations referring to that source, will also show the new title. This is something that you might not expect, when you rewrite a source title for a single event. If this was shared before, say because lots events were documented in a single family bible, changing the title for one to that of a particular record, will mess up a lot.

This sharing also applies to places, so you should also never change a place name when you discover that an event was registered with the wrong place. You should unlink the place from the event instead, and then link the event to another place, which you may create on the fly.

This place thing looks very weird if you come from another program, but it makes Gramps very powerful, if you know how to work with it.

Cheers,

Enno



------------------------------------------------------------------------------
Check out the vibrant tech community on one of the world's most
engaging tech sites, Slashdot.org! http://sdm.link/slashdot


_______________________________________________
Gramps-users mailing list
[hidden email]
https://lists.sourceforge.net/lists/listinfo/gramps-users
https://gramps-project.org


------------------------------------------------------------------------------
Check out the vibrant tech community on one of the world's most
engaging tech sites, Slashdot.org! http://sdm.link/slashdot
_______________________________________________
Gramps-users mailing list
[hidden email]
https://lists.sourceforge.net/lists/listinfo/gramps-users
https://gramps-project.org
Reply | Threaded
Open this post in threaded view
|

Re: Old laptops (was Re: How bad is my data?)

enno

Hello Rich,

I imported this GEDCOM in 2010. So I can't remember anything about the  import. I was a brand new GRAMPS user at the time. The date in the record I believe was the last time I imported a Gramps XML. I found several years ago that Gramps slows down over time and reimporting the data seems to clean things up.   My preliminary look at my dads FTM V11 shows a single citation/source that he only filled out the title with where he got his information.  It might give a persons name and email, or correspondence with so and so. one of the genealogy boards that no longer exist or from one of the FTM disk you could buy with info.  It does not say what the info was. Birth date, marriage data etc. 
But in the import these citations seem replicated. Can be 1 to 288 citations linked to a single source.
My late dad did the same thing as yours, putting everything in the title, in another piece of software, which was Brother's Keeper. When he died, I exported that to a GEDCOM file, imported that into PAF, and later moved all data to Gramps.

In that process, citations never got mixed up in the way that you described, so I'm still curious about what happened, and what the original GEDCOM looked like, because I see no reason for Gramps, or any other software, to mess up things this way.
Bottom line, I see no problem with Gramps, This is just something I need to wade through.  And as others have mentioned, GEDCOM is not perfect.
Although it's not, indeed, there is absolutely no reason for citations getting mangled in any way. If FTM wrote them to the GEDCOM in a proper way, Gramps should read them right, and there is no reason to expect any mangling in that area.
I first posted about this as a need to talk to someone and I thought the problem was much much more serious.  One word of wisdom others mentioned, do not trust a GEDCOM import.
H'm, so much for wisdom. I'm a software engineer by profession, I know Gramps' GEDCOM code, and I see no reason to distrust GEDCOM import of sources and citations. Gramps' GEDCOM import is thoroughly tested, and there is no reason for doubt in that area. That's a red herring.

My dad's work has survived 2 layers of GEDCOM import, from Brother's Keeper to PAF, and then to Gramps 3.3, or 3.4, I'm not sure. And in either no sources or citations were re-arranged in any way. The only thing that happened was PAF truncating long titles behind my back. And I later made a direct import from a Brother's Keeper GEDCOM into Gramps 3.4 to get the full ones. And that import was flawless too.

So, if citations were 'replicated' you can't really blame GEDCOM itself, because if that were the case, my imports would have gone wrong too. GEDCOM's citation and source structure is pretty straightforward, so if it goes wrong, there must be a bug somewhere. No doubt about that.

Regards,

Enno


------------------------------------------------------------------------------
Check out the vibrant tech community on one of the world's most
engaging tech sites, Slashdot.org! http://sdm.link/slashdot
_______________________________________________
Gramps-users mailing list
[hidden email]
https://lists.sourceforge.net/lists/listinfo/gramps-users
https://gramps-project.org
Reply | Threaded
Open this post in threaded view
|

Re: Old laptops (was Re: How bad is my data?)

sturdy
In reply to this post by Rich Lakey
Hi rich,

A few years ago I had a similar issue, but it was not with Gramps or
GEDCOM import or export. I was using the old FTM 2010 and 2012 synced to
an Ancestry.com tree. I am now sure the problem was the sync process
between Ancestry and FTM. I had hundreds/thousands of duplicates that
were actually kept in the Ancestry tree. I even had tree branches
duplicated. Eventually, I discovered that when the sync process failed I
could expect problems and they were very difficult to find. Eventually,
after talking to Enno, I gave up, imported the data and removed the
redundancies in Gramps. There were a few issues with GEDCOM
export/import but they were minor and did not impact redundancy. Those
sync issues seem to be corrected with the new version of FTM by McKiev.

As an aside, some duplication is a normal part of the GEDCOM process.
When a citation is linked to more than one event such as name, birth and
marriage, the GEDCOM process will create a similar citation for each
event. Once the data is in Gramps, you can remove the duplicates by
sharing one citation with each event and removing the others that are
not needed. But it can be a long and tedious process that requires a
great deal of care.

I also found the old FTM sync would cause other issues which you seem to
have had. Often citations would become unlinked and no longer point to
an event. However, they remained linked to the person name. There were
also many Ancestry.com records that when saved to a tree, did create a
name link but did not create the expected event links. There were many
other problems with Ancestry.com and the old FTM versions.

HTH,
BobS


On 05/19/2018 10:41 AM, Rich wrote:

> I imported this GEDCOM in 2010. So I can't remember anything about the
> import. I was a brand new GRAMPS user at the time. The date in the
> record I believe was the last time I imported a Gramps XML. I found
> several years ago that Gramps slows down over time and reimporting the
> data seems to clean things up.   My preliminary look at my dads FTM
> V11 shows a single citation/source that he only filled out the title
> with where he got his information.  It might give a persons name and
> email, or correspondence with so and so. one of the genealogy boards
> that no longer exist or from one of the FTM disk you could buy with
> info.  It does not say what the info was. Birth date, marriage data etc.
> But in the import these citations seem replicated. Can be 1 to 288
> citations linked to a single source.  Most (maybe 80%) of the
> citations reference a person and not an event. So they do not show up
> in the person view attached to anything.  Maybe another 15% of the
> citations have no reference.  Less than 5% are connected to a persons
> event.  These are the only ones I had previously been aware of. I
> would see one once in a while while attaching a citation to an event.
> They looked a bit strange and non descriptive but didn't seem a
> problem and I thought I would get to them some day.
> Bottom line, I see no problem with Gramps, This is just something I
> need to wade through.  And as others have mentioned, GEDCOM is not
> perfect.  I first posted about this as a need to talk to someone and I
> thought the problem was much much more serious.  One word of wisdom
> others mentioned, do not trust a GEDCOM import.
> Rich
>
> On 05/19/2018 08:55 AM, Enno Borgsteede wrote:
>>
>> Hello Rich,
>>
>>> Thanks Ron,  But I remembered I have the old version 11 of FTM that
>>> dad used on Windows 7 on my machine running under WINE.  On a brief
>>> look last night I found citations on that old FTM confusing. Not
>>> really a citation or a source but sort of a combination.  It also
>>> only allows me to see the entry for that person not a list of all
>>> citations. I have brought his machine up and hope to look closer
>>> this weekend. Although it is about 8 years old it isn't to slow. I
>>> have all his files on my machine for everything. Never know when a
>>> computer will give it up.
>>> I've been through over 3,000 citations. Some don't even have a
>>> reference.
>> Would you suspect that your citations were messed up by Gramps? You
>> didn't directly suggest such a thing, but I like to know if you do,
>> because then, other users can benefit from this too.
>>
>> One of the reasons for asking is that you reported that lots of
>> citations had the same date of last change, and that date may be set
>> when you run check & repair on your database, but also when you let
>> Gramps search for identical ones, and merge those.
>>
>> When citations have no date or page of their own, and they're linked
>> to the same source, Gramps will see those as identical, even when
>> some citations have notes. You can tick a box to prevent this, but if
>> you don't, you may very well end up with something that you don't
>> really want.
>>
>> Also, please note that sources and citations are shared, so that,
>> when you change the source title for one person, all other citations
>> referring to that source, will also show the new title. This is
>> something that you might not expect, when you rewrite a source title
>> for a single event. If this was shared before, say because lots
>> events were documented in a single family bible, changing the title
>> for one to that of a particular record, will mess up a lot.
>>
>> This sharing also applies to places, so you should also never change
>> a place name when you discover that an event was registered with the
>> wrong place. You should unlink the place from the event instead, and
>> then link the event to another place, which you may create on the fly.
>>
>> This place thing looks very weird if you come from another program,
>> but it makes Gramps very powerful, if you know how to work with it.
>>
>> Cheers,
>>
>> Enno
>>


------------------------------------------------------------------------------
Check out the vibrant tech community on one of the world's most
engaging tech sites, Slashdot.org! http://sdm.link/slashdot
_______________________________________________
Gramps-users mailing list
[hidden email]
https://lists.sourceforge.net/lists/listinfo/gramps-users
https://gramps-project.org