request for review, for 5.0.0

classic Classic list List threaded Threaded
6 messages Options
Reply | Threaded
Open this post in threaded view
|

request for review, for 5.0.0

Paul Franklin-5
Summary

I would like to request a review, discussion, modification
(if necessary), and whatever else might be required to
commit the attached patchfile to 5.0.0, as I think it is
important that 10078[1] be fixed before the "final" 5.0.0 is
released.


History

The DB method get_person_handles has had the optional
argument sort_handles for some time.  It is in 3.4.9 for
instance.  Likewise the 3.4.9 Complete Individual report has
a call of get_person_handles with sort_handles=True, which
sorted the handles by the persons' surnames.

For 5.0.0 that method's ability to provide sorted handles
was augmented to sort on the person's given name along with
their surname.  Just recently a user commented (on the list
or in a bug report, I forget,) that he had about 250 people
with a surname of "Smith" (IIRC) and it was for users like
that that the given-name augmentation was added.  (I had
noticed it in data.gramps also, which has a lot of Smiths.)

Similarly, for 5.0.0 the feature request 8649[2] was
implemented, adding the ability to sort family handles
similarly, for a new filter in the Family Group report.

Eventually, a user doing alpha testing of 5.0.0 (thanks!)
had the Complete Individual report crash on him.  The
discussion and ensuing debugging took a while, since the
user had a large DB and didn't know where the problem was.
But he persevered (thanks!) and was eventually able to
isolate the problem individual, and Josip then noticed the
person had no line like this in the gramps XML:

<surname></surname>

That is, they had no surname field at all, not even a null
one -- only a first-name field.

So the 10078 user used a text editor (with a multiline
search capability) to find the offending people, half a
dozen (IIRC), and fix them.

Sometime later Sam added a comment to 10078 with two lines
added to the XML exporter, which I don't recall the user
ever commenting on (perhaps because his problem had been
edited away).  Those two lines eventually turned into pull
release 608[3].

When a beta-tester second user filed a bug with the same
problem[4], Paul looked into it and commented on p.r. 608
that he felt another approach was better, then filed pull
request 613[5] accordingly.  It modifies the DB layer to add
a test for a missing surname, so that the DB layer no longer
crashes.

Paul also commented that it was likely that some gramps
importer was (or importers were) causing the problem, which
had just never been noticed before now, since only the 5.0.0
changes were provoking it.

Subsequently, Sam cancelled his p.r. 608 in favor of Paul's
p.r. 613.

I didn't like the approach in 613, since I don't like the
idea of slowing down the DB layer with an "if" -- especially
when it may be the case that only some DBs have the problem,
maybe even a small number.  We can't know for sure of
course, since there has only been a limited amount of alpha
and beta testing of 5.0.0.

So I emailed a preliminary version of the attached patchfile
to Paul, on about 12 May (which was before "rc1" of course),
and also asked him not to commit 613.  He never responded; I
assume because he was "on the road" and then had to do the
5.0.0-rc1 release after he got back.


Proposal and Argument

So I am taking the somewhat-unusual step of attaching the
updated patchfile and asking that it be committed in time
for 5.0.0, as that may happen relatively quickly, I don't
know.

As you will see when you look at the attached file, it
modifies these files:

gramps/plugins/export/exportxml.py
gramps/plugins/importer/importxml.py
gramps/plugins/importer/test/importvcard_test.py
gramps/plugins/lib/libgedcom.py
gramps/plugins/textreport/familygroup.py
gramps/plugins/textreport/indivcomplete.py
gramps/plugins/tool/check.py

The XML and GEDCOM importers were modified to add a null
surname if necessary, should a user import a person which
needs one.  (I didn't look at the other importers but my
guess is that these two will cover most such imports.  I
think the 10078 user said he'd done his import years
earlier, I assume via GEDCOM.)

I am by no means a GEDCOM expert but it seems to be the case
that only a GEDCOM like this triggers it:

0 @I0000@ INDI
1 NAME Firstname
2 GIVN Firstname
1 SEX M

since gramps correctly assigns a null surname if instead the
GEDCOM is:

0 @I0000@ INDI
1 NAME Firstname //
2 GIVN Firstname
1 SEX M

so evidently, at least in the past, some genealogy programs
exported the former (incorrect I assume) format, whereas
others exported the latter (correct I assume) format.

(I believe gramps does "the right thing" and assigns a null
surname when a user creates a person with only a given name,
at least in my testing, so nothing needs fixing there.)

To cover the case where the user's gramps DB already has
people with a missing surname, I incorporated Sam's p.r. 608
patch into the attached patchfile, since that pull release
modifies the XML exporter to assign such no-surname people a
null surname.

However that p.r. also caused Travis to fail, since the
vcard importer is (apparently) similarly faulty.  (I didn't
look, since I assume that very few people will import a
vcard DB into gramps, but if needed I imagine it could be
fixed similarly.)  So another modified module fixes the
vcard unit test which used to fail.

(Parenthetically, let me note that I tried to test the
attached patchfile, by doing things like "python3 setup.py
test" but I got failures.  I also got failures when I did
the same thing on an unpatched 5.0.0-beta1 so I stopped
trying to see if I could test this patch.  Somebody who
knows more about testing will need to look into it.  I can't
tell if Travis is ignoring tests which involve the
TestcaseGenerator for instance.  Nor could I get the
imports_test.py module to run stand-alone, even in an
unpatched gramps.  So clearly I am ignorant about such
things, or have a faulty machine, or whatever.  However, I
have of course tested the attached code, and to the best of
my knowledge it all works.)

Note that none of the patches I have mentioned so far
involve any strings, so the fact that gramps is under a
"string freeze" shouldn't inhibit their commitment.

Note also that if only those which I have already mentioned
were committed, then gramps would be better.  If a user's DB
triggered a crash -- which can only happen in the Complete
Individual or Family Group reports, to the best of my
knowledge -- if they do get a crash they can be instructed
to export and then import their DB as XML (into an empty DB
of course), which is one of the standard ways we advise
users to fix some problems.

However, the attached patchfile also modifies those two
reports, to not crash when the user has an offending
database.  It does that by detecting the DB-layer crash (an
"IndexError") inside a try/except, then advises the user
what to do to fix it.

That is one reason I requested Paul not to commit his
p.r. 613, as that crash is what I detect in my two modified
reports.  Another reason is that 613 doesn't really fix
anything in the database; it just makes it so that gramps
doesn't crash with a DB like that.  Whereas the attached
patchfile not only doesn't crash (in those two reports) but
it also fixes the underlying problem(s).  I feel that
suspenders-and-belt approach (or braces-and-belt if you
prefer) is a better one, more comprehensive.

Note that while the two modified reports have strings in
their patches, those strings are already used in gramps, so
they too won't affect the "string freeze."

The final module the attached patchfile changes is the tool
to check and repair the database, since the 10078 user
reported that he had tried running that tool but it hadn't
helped him.  So I put in a test for a person with a missing
Surname field which added one to any person who needed one.

The two strings added to patch plugins/tool/check.py are the
only two new strings added by this patch.  I claim that
gramps will be better if the tool is so modified, even if
those two new strings end up not being translated, since the
lines in the two reports which say to run that tool are old
ones and thus are already translated, so the user will be
told to run it.  The patched tool will still correct the DB
problem, even if the summary information has an untranslated
line or two.

As an alternative, if the "string freeze" really is written
in a tablet of stone, two other strings could be used in the
new section of check.py instead, some already-existing ones,
which could be thought of as placeholders and changed to the
correct ones as soon as the final 5.0.0 is released, for
5.0.1.  I don't like that alternative, but I suppose it's
possible.

Note that my patches are not time-sensitive.  They will not
be done for DB accesses.  If the DB already has a problem,
either a report will notice it and tell the user to run the
tool, or else the user will happen to import or export their
data and the problem will also be fixed, forevermore.  And
with the GEDCOM and XML importers no longer causing a
problem for anything imported new, these no-surname problems
in gramps DBs should slowly disappear.

Note that I am only talking about the BSDDB database and its
layer.  When I converted a BSDDB DB to SQLite and then ran
those two reports, neither crashed.  However, when I then
converted that converted SQLite DB back into a BSDDB DB,
that doubly-converted DB still (or "also") has the problem.
So it can't be solved by telling the user to convert and
then convert back.

So that's what I propose, and I hope the decision is made to
commit the attached patchfile (or some slightly-altered
version, if necessary).

Although as I said, I don't know what Travis will do, and I
can imagine that somebody who understands testing will need
to modify some test module, if it then starts failing after
these patches.  (Although why some test should expect a
person to have no Surname at all I can't imagine.)

Thank you.



[1]
https://gramps-project.org/bugs/view.php?id=10078

[2]
https://gramps-project.org/bugs/view.php?id=8649

[3]
https://github.com/gramps-project/gramps/pull/608

[4]
https://gramps-project.org/bugs/view.php?id=10577

[5]
https://github.com/gramps-project/gramps/pull/613

------------------------------------------------------------------------------
Check out the vibrant tech community on one of the world's most
engaging tech sites, Slashdot.org! http://sdm.link/slashdot
_______________________________________________
Gramps-devel mailing list
[hidden email]
https://lists.sourceforge.net/lists/listinfo/gramps-devel

10078-20180521B.patch (11K) Download Attachment
Reply | Threaded
Open this post in threaded view
|

Re: request for review, for 5.0.0

Nick Hall
Paul,

Thanks for the patch.

It is valid to omit the surname tag in the XML.  Have a look at the DTD
and Relax NG schema:

https://github.com/gramps-project/gramps/blob/master/data/grampsxml.dtd#L122
https://github.com/gramps-project/gramps/blob/master/data/grampsxml.rng#L229

However, we need to decide whether to create an empty surname object in
cases where the surname is missing.  I will await comments from other
developers, especially Paul Culley.

Adding new strings is acceptable when fixing bugs.  Generally you should
try to avoid doing so, or use existing strings, but sometimes the
advantages outweigh the extra work required by the translators.

Please note that we have a new contribution policy:

https://gramps-project.org/wiki/index.php?title=Committing_policies#Contributing

When objecting to a pull request you should add a comment on GitHub
instead of emailing the author privately.

If your design is accepted on the list, then the next step is to open a
pull request.

I will look into the unit test issue separately.  All tests should pass
on all platforms.  The TestCaseGenerator is not actually a unit test
which is why it is excluded from Travis.  See:

https://github.com/gramps-project/gramps/blob/master/.travis.yml#L93

Regards,


Nick.


On 23/05/18 07:34, Paul Franklin wrote:

> Summary
>
> I would like to request a review, discussion, modification
> (if necessary), and whatever else might be required to
> commit the attached patchfile to 5.0.0, as I think it is
> important that 10078[1] be fixed before the "final" 5.0.0 is
> released.
>
>
> History
>
> The DB method get_person_handles has had the optional
> argument sort_handles for some time.  It is in 3.4.9 for
> instance.  Likewise the 3.4.9 Complete Individual report has
> a call of get_person_handles with sort_handles=True, which
> sorted the handles by the persons' surnames.
>
> For 5.0.0 that method's ability to provide sorted handles
> was augmented to sort on the person's given name along with
> their surname.  Just recently a user commented (on the list
> or in a bug report, I forget,) that he had about 250 people
> with a surname of "Smith" (IIRC) and it was for users like
> that that the given-name augmentation was added.  (I had
> noticed it in data.gramps also, which has a lot of Smiths.)
>
> Similarly, for 5.0.0 the feature request 8649[2] was
> implemented, adding the ability to sort family handles
> similarly, for a new filter in the Family Group report.
>
> Eventually, a user doing alpha testing of 5.0.0 (thanks!)
> had the Complete Individual report crash on him.  The
> discussion and ensuing debugging took a while, since the
> user had a large DB and didn't know where the problem was.
> But he persevered (thanks!) and was eventually able to
> isolate the problem individual, and Josip then noticed the
> person had no line like this in the gramps XML:
>
> <surname></surname>
>
> That is, they had no surname field at all, not even a null
> one -- only a first-name field.
>
> So the 10078 user used a text editor (with a multiline
> search capability) to find the offending people, half a
> dozen (IIRC), and fix them.
>
> Sometime later Sam added a comment to 10078 with two lines
> added to the XML exporter, which I don't recall the user
> ever commenting on (perhaps because his problem had been
> edited away).  Those two lines eventually turned into pull
> release 608[3].
>
> When a beta-tester second user filed a bug with the same
> problem[4], Paul looked into it and commented on p.r. 608
> that he felt another approach was better, then filed pull
> request 613[5] accordingly.  It modifies the DB layer to add
> a test for a missing surname, so that the DB layer no longer
> crashes.
>
> Paul also commented that it was likely that some gramps
> importer was (or importers were) causing the problem, which
> had just never been noticed before now, since only the 5.0.0
> changes were provoking it.
>
> Subsequently, Sam cancelled his p.r. 608 in favor of Paul's
> p.r. 613.
>
> I didn't like the approach in 613, since I don't like the
> idea of slowing down the DB layer with an "if" -- especially
> when it may be the case that only some DBs have the problem,
> maybe even a small number.  We can't know for sure of
> course, since there has only been a limited amount of alpha
> and beta testing of 5.0.0.
>
> So I emailed a preliminary version of the attached patchfile
> to Paul, on about 12 May (which was before "rc1" of course),
> and also asked him not to commit 613.  He never responded; I
> assume because he was "on the road" and then had to do the
> 5.0.0-rc1 release after he got back.
>
>
> Proposal and Argument
>
> So I am taking the somewhat-unusual step of attaching the
> updated patchfile and asking that it be committed in time
> for 5.0.0, as that may happen relatively quickly, I don't
> know.
>
> As you will see when you look at the attached file, it
> modifies these files:
>
> gramps/plugins/export/exportxml.py
> gramps/plugins/importer/importxml.py
> gramps/plugins/importer/test/importvcard_test.py
> gramps/plugins/lib/libgedcom.py
> gramps/plugins/textreport/familygroup.py
> gramps/plugins/textreport/indivcomplete.py
> gramps/plugins/tool/check.py
>
> The XML and GEDCOM importers were modified to add a null
> surname if necessary, should a user import a person which
> needs one.  (I didn't look at the other importers but my
> guess is that these two will cover most such imports.  I
> think the 10078 user said he'd done his import years
> earlier, I assume via GEDCOM.)
>
> I am by no means a GEDCOM expert but it seems to be the case
> that only a GEDCOM like this triggers it:
>
> 0 @I0000@ INDI
> 1 NAME Firstname
> 2 GIVN Firstname
> 1 SEX M
>
> since gramps correctly assigns a null surname if instead the
> GEDCOM is:
>
> 0 @I0000@ INDI
> 1 NAME Firstname //
> 2 GIVN Firstname
> 1 SEX M
>
> so evidently, at least in the past, some genealogy programs
> exported the former (incorrect I assume) format, whereas
> others exported the latter (correct I assume) format.
>
> (I believe gramps does "the right thing" and assigns a null
> surname when a user creates a person with only a given name,
> at least in my testing, so nothing needs fixing there.)
>
> To cover the case where the user's gramps DB already has
> people with a missing surname, I incorporated Sam's p.r. 608
> patch into the attached patchfile, since that pull release
> modifies the XML exporter to assign such no-surname people a
> null surname.
>
> However that p.r. also caused Travis to fail, since the
> vcard importer is (apparently) similarly faulty.  (I didn't
> look, since I assume that very few people will import a
> vcard DB into gramps, but if needed I imagine it could be
> fixed similarly.)  So another modified module fixes the
> vcard unit test which used to fail.
>
> (Parenthetically, let me note that I tried to test the
> attached patchfile, by doing things like "python3 setup.py
> test" but I got failures.  I also got failures when I did
> the same thing on an unpatched 5.0.0-beta1 so I stopped
> trying to see if I could test this patch.  Somebody who
> knows more about testing will need to look into it.  I can't
> tell if Travis is ignoring tests which involve the
> TestcaseGenerator for instance.  Nor could I get the
> imports_test.py module to run stand-alone, even in an
> unpatched gramps.  So clearly I am ignorant about such
> things, or have a faulty machine, or whatever.  However, I
> have of course tested the attached code, and to the best of
> my knowledge it all works.)
>
> Note that none of the patches I have mentioned so far
> involve any strings, so the fact that gramps is under a
> "string freeze" shouldn't inhibit their commitment.
>
> Note also that if only those which I have already mentioned
> were committed, then gramps would be better.  If a user's DB
> triggered a crash -- which can only happen in the Complete
> Individual or Family Group reports, to the best of my
> knowledge -- if they do get a crash they can be instructed
> to export and then import their DB as XML (into an empty DB
> of course), which is one of the standard ways we advise
> users to fix some problems.
>
> However, the attached patchfile also modifies those two
> reports, to not crash when the user has an offending
> database.  It does that by detecting the DB-layer crash (an
> "IndexError") inside a try/except, then advises the user
> what to do to fix it.
>
> That is one reason I requested Paul not to commit his
> p.r. 613, as that crash is what I detect in my two modified
> reports.  Another reason is that 613 doesn't really fix
> anything in the database; it just makes it so that gramps
> doesn't crash with a DB like that.  Whereas the attached
> patchfile not only doesn't crash (in those two reports) but
> it also fixes the underlying problem(s).  I feel that
> suspenders-and-belt approach (or braces-and-belt if you
> prefer) is a better one, more comprehensive.
>
> Note that while the two modified reports have strings in
> their patches, those strings are already used in gramps, so
> they too won't affect the "string freeze."
>
> The final module the attached patchfile changes is the tool
> to check and repair the database, since the 10078 user
> reported that he had tried running that tool but it hadn't
> helped him.  So I put in a test for a person with a missing
> Surname field which added one to any person who needed one.
>
> The two strings added to patch plugins/tool/check.py are the
> only two new strings added by this patch.  I claim that
> gramps will be better if the tool is so modified, even if
> those two new strings end up not being translated, since the
> lines in the two reports which say to run that tool are old
> ones and thus are already translated, so the user will be
> told to run it.  The patched tool will still correct the DB
> problem, even if the summary information has an untranslated
> line or two.
>
> As an alternative, if the "string freeze" really is written
> in a tablet of stone, two other strings could be used in the
> new section of check.py instead, some already-existing ones,
> which could be thought of as placeholders and changed to the
> correct ones as soon as the final 5.0.0 is released, for
> 5.0.1.  I don't like that alternative, but I suppose it's
> possible.
>
> Note that my patches are not time-sensitive.  They will not
> be done for DB accesses.  If the DB already has a problem,
> either a report will notice it and tell the user to run the
> tool, or else the user will happen to import or export their
> data and the problem will also be fixed, forevermore.  And
> with the GEDCOM and XML importers no longer causing a
> problem for anything imported new, these no-surname problems
> in gramps DBs should slowly disappear.
>
> Note that I am only talking about the BSDDB database and its
> layer.  When I converted a BSDDB DB to SQLite and then ran
> those two reports, neither crashed.  However, when I then
> converted that converted SQLite DB back into a BSDDB DB,
> that doubly-converted DB still (or "also") has the problem.
> So it can't be solved by telling the user to convert and
> then convert back.
>
> So that's what I propose, and I hope the decision is made to
> commit the attached patchfile (or some slightly-altered
> version, if necessary).
>
> Although as I said, I don't know what Travis will do, and I
> can imagine that somebody who understands testing will need
> to modify some test module, if it then starts failing after
> these patches.  (Although why some test should expect a
> person to have no Surname at all I can't imagine.)
>
> Thank you.
>
>
>
> [1]
> https://gramps-project.org/bugs/view.php?id=10078
>
> [2]
> https://gramps-project.org/bugs/view.php?id=8649
>
> [3]
> https://github.com/gramps-project/gramps/pull/608
>
> [4]
> https://gramps-project.org/bugs/view.php?id=10577
>
> [5]
> https://github.com/gramps-project/gramps/pull/613



------------------------------------------------------------------------------
Check out the vibrant tech community on one of the world's most
engaging tech sites, Slashdot.org! http://sdm.link/slashdot
_______________________________________________
Gramps-devel mailing list
[hidden email]
https://lists.sourceforge.net/lists/listinfo/gramps-devel
Reply | Threaded
Open this post in threaded view
|

Re: request for review, for 5.0.0

prculley
In reply to this post by Paul Franklin-5
Regarding my db patch to fix for missing surnames;
Paul, sorry I did not get back to you sooner, had heart surgery, travel, and a lot of other items to take care of lately.

I am concerned that there are a fair number of users out there that could have the issue.  While your patch will alert them to the problem and make them run Check/Repair, I regard this as intrusive to the user, and not really necessary.

In my opinion, if the XML DTD allows no surname tag, that implies we don't need an empty value in the internal person's surname list.  I'll defer to Nick or others on whether an empty surname list is valid or not, I just note that it has been that way for a long time now...

Since your concern seem to be about my patch performance, I thought I would see how much it affects the db call.  Turns out it adds 0.03usec per person in the db on my machine.  So a 10000 person db would add 0.3millisec to the export or report.  Not much of a performance hit.

If you want better performance, we can do some micro-optimization of the associated statement since we are repeating the indexing several times:
Original:
    if data[3][5]:  # if Surname available
        fullname_data = [(data[3][5][0][0] + ' ' + data[3][4],  # combined
                          data[3][5][0][1], data[3][5][0][2],
                          data[3][5][0][3], data[3][5][0][4])]
    else:  # Some importers don't add any Surname at all
        fullname_data = [(' ' + data[3][4], '', True, (1, ''), '')]
Optimized:
    d1=data[3][5]
    if d1:  # if Surname available
        d2 = d1[0]
        fullname_data = [(d2[0] + ' ' + data[3][4],  # combined
                          d2[1], d2[2],
                          d2[3], d2[4])]
    else:  # Some importers don't add any Surname at all
        fullname_data = [(' ' + data[3][4], '', True, (1, ''), '')]

The original code runs 0.78usec, the original patch 0.81use, the optimized version runs 0.50usec per person.  We could save 3.1millisec on our 10000 person export/report, yippie woo hoo...

Paul C.

On Wed, May 23, 2018 at 1:34 AM, Paul Franklin <[hidden email]> wrote:
Summary

I would like to request a review, discussion, modification
(if necessary), and whatever else might be required to
commit the attached patchfile to 5.0.0, as I think it is
important that 10078[1] be fixed before the "final" 5.0.0 is
released.


History

The DB method get_person_handles has had the optional
argument sort_handles for some time.  It is in 3.4.9 for
instance.  Likewise the 3.4.9 Complete Individual report has
a call of get_person_handles with sort_handles=True, which
sorted the handles by the persons' surnames.

For 5.0.0 that method's ability to provide sorted handles
was augmented to sort on the person's given name along with
their surname.  Just recently a user commented (on the list
or in a bug report, I forget,) that he had about 250 people
with a surname of "Smith" (IIRC) and it was for users like
that that the given-name augmentation was added.  (I had
noticed it in data.gramps also, which has a lot of Smiths.)

Similarly, for 5.0.0 the feature request 8649[2] was
implemented, adding the ability to sort family handles
similarly, for a new filter in the Family Group report.

Eventually, a user doing alpha testing of 5.0.0 (thanks!)
had the Complete Individual report crash on him.  The
discussion and ensuing debugging took a while, since the
user had a large DB and didn't know where the problem was.
But he persevered (thanks!) and was eventually able to
isolate the problem individual, and Josip then noticed the
person had no line like this in the gramps XML:

<surname></surname>

That is, they had no surname field at all, not even a null
one -- only a first-name field.

So the 10078 user used a text editor (with a multiline
search capability) to find the offending people, half a
dozen (IIRC), and fix them.

Sometime later Sam added a comment to 10078 with two lines
added to the XML exporter, which I don't recall the user
ever commenting on (perhaps because his problem had been
edited away).  Those two lines eventually turned into pull
release 608[3].

When a beta-tester second user filed a bug with the same
problem[4], Paul looked into it and commented on p.r. 608
that he felt another approach was better, then filed pull
request 613[5] accordingly.  It modifies the DB layer to add
a test for a missing surname, so that the DB layer no longer
crashes.

Paul also commented that it was likely that some gramps
importer was (or importers were) causing the problem, which
had just never been noticed before now, since only the 5.0.0
changes were provoking it.

Subsequently, Sam cancelled his p.r. 608 in favor of Paul's
p.r. 613.

I didn't like the approach in 613, since I don't like the
idea of slowing down the DB layer with an "if" -- especially
when it may be the case that only some DBs have the problem,
maybe even a small number.  We can't know for sure of
course, since there has only been a limited amount of alpha
and beta testing of 5.0.0.

So I emailed a preliminary version of the attached patchfile
to Paul, on about 12 May (which was before "rc1" of course),
and also asked him not to commit 613.  He never responded; I
assume because he was "on the road" and then had to do the
5.0.0-rc1 release after he got back.


Proposal and Argument

So I am taking the somewhat-unusual step of attaching the
updated patchfile and asking that it be committed in time
for 5.0.0, as that may happen relatively quickly, I don't
know.

As you will see when you look at the attached file, it
modifies these files:

gramps/plugins/export/exportxml.py
gramps/plugins/importer/importxml.py
gramps/plugins/importer/test/importvcard_test.py
gramps/plugins/lib/libgedcom.py
gramps/plugins/textreport/familygroup.py
gramps/plugins/textreport/indivcomplete.py
gramps/plugins/tool/check.py

The XML and GEDCOM importers were modified to add a null
surname if necessary, should a user import a person which
needs one.  (I didn't look at the other importers but my
guess is that these two will cover most such imports.  I
think the 10078 user said he'd done his import years
earlier, I assume via GEDCOM.)

I am by no means a GEDCOM expert but it seems to be the case
that only a GEDCOM like this triggers it:

0 @I0000@ INDI
1 NAME Firstname
2 GIVN Firstname
1 SEX M

since gramps correctly assigns a null surname if instead the
GEDCOM is:

0 @I0000@ INDI
1 NAME Firstname //
2 GIVN Firstname
1 SEX M

so evidently, at least in the past, some genealogy programs
exported the former (incorrect I assume) format, whereas
others exported the latter (correct I assume) format.

(I believe gramps does "the right thing" and assigns a null
surname when a user creates a person with only a given name,
at least in my testing, so nothing needs fixing there.)

To cover the case where the user's gramps DB already has
people with a missing surname, I incorporated Sam's p.r. 608
patch into the attached patchfile, since that pull release
modifies the XML exporter to assign such no-surname people a
null surname.

However that p.r. also caused Travis to fail, since the
vcard importer is (apparently) similarly faulty.  (I didn't
look, since I assume that very few people will import a
vcard DB into gramps, but if needed I imagine it could be
fixed similarly.)  So another modified module fixes the
vcard unit test which used to fail.

(Parenthetically, let me note that I tried to test the
attached patchfile, by doing things like "python3 setup.py
test" but I got failures.  I also got failures when I did
the same thing on an unpatched 5.0.0-beta1 so I stopped
trying to see if I could test this patch.  Somebody who
knows more about testing will need to look into it.  I can't
tell if Travis is ignoring tests which involve the
TestcaseGenerator for instance.  Nor could I get the
imports_test.py module to run stand-alone, even in an
unpatched gramps.  So clearly I am ignorant about such
things, or have a faulty machine, or whatever.  However, I
have of course tested the attached code, and to the best of
my knowledge it all works.)

Note that none of the patches I have mentioned so far
involve any strings, so the fact that gramps is under a
"string freeze" shouldn't inhibit their commitment.

Note also that if only those which I have already mentioned
were committed, then gramps would be better.  If a user's DB
triggered a crash -- which can only happen in the Complete
Individual or Family Group reports, to the best of my
knowledge -- if they do get a crash they can be instructed
to export and then import their DB as XML (into an empty DB
of course), which is one of the standard ways we advise
users to fix some problems.

However, the attached patchfile also modifies those two
reports, to not crash when the user has an offending
database.  It does that by detecting the DB-layer crash (an
"IndexError") inside a try/except, then advises the user
what to do to fix it.

That is one reason I requested Paul not to commit his
p.r. 613, as that crash is what I detect in my two modified
reports.  Another reason is that 613 doesn't really fix
anything in the database; it just makes it so that gramps
doesn't crash with a DB like that.  Whereas the attached
patchfile not only doesn't crash (in those two reports) but
it also fixes the underlying problem(s).  I feel that
suspenders-and-belt approach (or braces-and-belt if you
prefer) is a better one, more comprehensive.

Note that while the two modified reports have strings in
their patches, those strings are already used in gramps, so
they too won't affect the "string freeze."

The final module the attached patchfile changes is the tool
to check and repair the database, since the 10078 user
reported that he had tried running that tool but it hadn't
helped him.  So I put in a test for a person with a missing
Surname field which added one to any person who needed one.

The two strings added to patch plugins/tool/check.py are the
only two new strings added by this patch.  I claim that
gramps will be better if the tool is so modified, even if
those two new strings end up not being translated, since the
lines in the two reports which say to run that tool are old
ones and thus are already translated, so the user will be
told to run it.  The patched tool will still correct the DB
problem, even if the summary information has an untranslated
line or two.

As an alternative, if the "string freeze" really is written
in a tablet of stone, two other strings could be used in the
new section of check.py instead, some already-existing ones,
which could be thought of as placeholders and changed to the
correct ones as soon as the final 5.0.0 is released, for
5.0.1.  I don't like that alternative, but I suppose it's
possible.

Note that my patches are not time-sensitive.  They will not
be done for DB accesses.  If the DB already has a problem,
either a report will notice it and tell the user to run the
tool, or else the user will happen to import or export their
data and the problem will also be fixed, forevermore.  And
with the GEDCOM and XML importers no longer causing a
problem for anything imported new, these no-surname problems
in gramps DBs should slowly disappear.

Note that I am only talking about the BSDDB database and its
layer.  When I converted a BSDDB DB to SQLite and then ran
those two reports, neither crashed.  However, when I then
converted that converted SQLite DB back into a BSDDB DB,
that doubly-converted DB still (or "also") has the problem.
So it can't be solved by telling the user to convert and
then convert back.

So that's what I propose, and I hope the decision is made to
commit the attached patchfile (or some slightly-altered
version, if necessary).

Although as I said, I don't know what Travis will do, and I
can imagine that somebody who understands testing will need
to modify some test module, if it then starts failing after
these patches.  (Although why some test should expect a
person to have no Surname at all I can't imagine.)

Thank you.



[1]
https://gramps-project.org/bugs/view.php?id=10078

[2]
https://gramps-project.org/bugs/view.php?id=8649

[3]
https://github.com/gramps-project/gramps/pull/608

[4]
https://gramps-project.org/bugs/view.php?id=10577

[5]
https://github.com/gramps-project/gramps/pull/613

------------------------------------------------------------------------------
Check out the vibrant tech community on one of the world's most
engaging tech sites, Slashdot.org! http://sdm.link/slashdot
_______________________________________________
Gramps-devel mailing list
[hidden email]
https://lists.sourceforge.net/lists/listinfo/gramps-devel



------------------------------------------------------------------------------
Check out the vibrant tech community on one of the world's most
engaging tech sites, Slashdot.org! http://sdm.link/slashdot
_______________________________________________
Gramps-devel mailing list
[hidden email]
https://lists.sourceforge.net/lists/listinfo/gramps-devel
Reply | Threaded
Open this post in threaded view
|

Re: request for review, for 5.0.0

Paul Franklin-5
Thank you both for your responses.  That was certainly one
reason for the post, to gather comments and start a
discussion about the problem -- although I admit that I
hadn't considered the possibility of first deciding whether
it is even a problem at all.

But let me first say that I certainly meant no criticism
when I said that you hadn't responded Paul.  I was just
stating it.  After all, you had already said you were "on
the road" and then did the "rc1" release, so I knew you had
been busy.  I'm glad to hear your heart surgery happened but
as we both know, "life goes on."  I remember a few years ago
when I had a procedure done to my heart and when they
released me from the hospital I lied to them and said I was
taking a taxi home and then instead drove home in my own
car.  We can all be thankful modern medicine is so good.

As to whether the DTD allows Name objects to not have a
Surname child, I certainly admit that I never considered the
possibility.  On the other hand, I observe (in places like
the data.gramps file) that we are now up to revision "1.7.1"
so that certainly implies to me that the DTD has changed in
the past, and thus that it could be changed again.  I assume
that as the Architect that you have full authority over
that, Nick.  But it certainly seems to me that whatever your
decision is, it will have a bearing on this discussion.

Let me also mention that I find it a little bit strange that
running the "Check and Repair Database" tool can be thought
of as "intrusive" since it certainly seems to me that the
tool's sole purpose is to check "the database for integrity
problems, fixing the problems that it can".  If it wasn't
meant to be run, why do we even have it?  Once it is
patched, running it once will fix the surname fields.  As I
said, the 10078 user thought to run it, unprompted.  And the
string telling the user to run it is already in gramps.

Of course if the architectural decision is made that a Name
object doesn't have to have a child Surname object, have an
empty list of surnames, then I imagine that it certainly
follows that such a DB does not have a "problem" at all.

So perhaps in that case the existing gramps code should be
inspected for places where such a null Surname object is
currently made and those places changed to remove the null
Surname object creation, since it doesn't follow the DTD?

But if a Surname object is optional, then I see no reason
why my patches can't be applied, even if it isn't mandatory.

After all, the patches arguably don't create a "real"
Surname, even a null-string one.  They just create a null
Surname, a totally empty one, inside a new surname list.

As far as execution timing is concerned, I'm sure you
remember Paul that you and I have had discussions in the
past about the speeds of our respective machines.  I can
recall instances where you couldn't recreate a bug I was
observing until you slowed down gramps on your machine, for
instance.

I see nothing wrong with being philosophically against
slowing down the DB layer no matter what the reason, no
matter what the code, no matter how small a time is added.
But that seems like an architectural decision to me and
again I defer to Nick.

I will add however, that as I understand it, gramps is now
trying a bit harder to have such design decisions discussed
on the list -- before they are coded and implemented.  Since
as we have all observed in the past, once a developer codes
a solution to what he thinks is a problem, he can then
become defensive about it, or become reluctant to change it.

So in the spirit of "truth in advertising" I will mention
that as a "git blame" will show, I was the one who added the
find_fullname method to plugins/db/bsddb/read.py, a little
over two years ago (in 7f41373f, for 8649 as I earlier
mentioned).  But if its code can be optimized and made
faster, I not only have no objection but I would imagine it
would be a good idea.  But I consider that an architectural
decision and again I would defer to Nick.

Concerning testing, as I said I admit my general ignorance
in that area.  However I will observe that to the best of my
knowledge there are many "test" modules in gramps which are
not what I would call a unit test at all, but are much
larger.  Thank you for looking into testing, Nick.

I would hope that a goal would be to (once again?) allow
stand-alone testing, for instance by the "python3 setup.py
test" I mentioned.  And if there are optional modules or
addons, I would hope that the code could be modified to not
call their absence a failure.  And of course if they are not
optional but are required -- I haven't the slightest idea
what the CLImerge "addon" does for instance, or other things
which Travis apparently downloads -- then the tests should
be altered to fail upon its/their absence, with a useful
message telling the stand-alone developer what to do.

If the TestcaseGenerator is not meant to be run in any
bigger framework, then perhaps the test module which does
that should be renamed or moved out of the general "gramps"
tree and into the data/tests tree instead?  Just a thought.
I don't claim to understand these things, as I've said.

In summary, if Paul thinks my patch is "not really
necessary" perhaps that is the end of it.  After all, Nick
said he awaits "comments from other developers, especially
Paul Culley."

On the other hand, Paul said "I'll defer to Nick or others
on whether an empty surname list is valid or not."  So it
seems to me the architectural decision is Nick's to make.

After listening to "other developers" of course.  So Paul
thinks my patch is "not really necessary" and I feel that it
makes gramps better and fixes a problem.  Two people with
two different opinions.  Life is like that, sometimes.

------------------------------------------------------------------------------
Check out the vibrant tech community on one of the world's most
engaging tech sites, Slashdot.org! http://sdm.link/slashdot
_______________________________________________
Gramps-devel mailing list
[hidden email]
https://lists.sourceforge.net/lists/listinfo/gramps-devel
Reply | Threaded
Open this post in threaded view
|

Re: request for review, for 5.0.0

Paul Franklin-5
I am not familiar with DTD (Document Type Definition)
syntax, and don't have the time to read up on it right now.
(Besides, at the moment I am using a dialup connection and
so I have to go to a library if I want higher bandwidth.)

However, if (as suggested) I look at grampsxml.dtd I see:

<!ELEMENT person (gender, name*, eventref*, lds_ord*,
                  objref*, address*, attribute*, url*, childof*,
                  parentin*, personref*, noteref*, citationref*, tagref*)>
<!ATTLIST person
        id        CDATA #IMPLIED
        handle    ID    #REQUIRED
        priv      (0|1) #IMPLIED
        change    CDATA #REQUIRED
>

That seems to imply to me that ELEMENTs of a "person" are
the possible sub-elements (as opposed to the ATTLISTs of a
"person" which are clearly things on the same XML line).

So I am still guessing, but I'd guess that the element
definition says that a "person" must have a "gender" whereas
all the other possibilities, the ones followed by an
asterisk, are optional, including a "name" of the "person".

Note that "gender" has three choices, so while a "person"
must have a "gender" it can be specified as an unknown
gender, or male or female.  But that is only specified in a
comment:

<!--
GENDER has values of M, F, or U.
-->
<!ELEMENT gender  (#PCDATA)>

since the actual ELEMENT definition says "gender" is a
(#PCDATA) -- although the Person dialog certainly doesn't
allow you to erase one of the standard three choices and
write in some string of your own, no matter what the DTD
says.  So that's a difference between the DTD and gramps.

Moving on, a few lines lower we see:

<!ELEMENT name    (first?, call?, surname*, suffix?, title?, nick?,
familynick?, group?,
                  (daterange|datespan|dateval|datestr)?, noteref*,
citationref*)>

Again assuming these ELEMENTs of a "name" are the possible
sub-elements, we immediately notice that there is a question
mark or asterisk after all of them.  None of them are just
the element name (the way "gender" was for a "person").

So I assume the question mark or asterisk means they are all
optional, in some way or another.  But note that "first" is
specified the same way as "call" and "suffix" and "title"
and "nick" and "familynick" -- and so, guessing that the
question mark states the type of optionality those fields
have, a "first" name doesn't have to be present at all, any
more than a call name does, or a nickname, or a person's
title or suffix.  That the first name is totally optional.

Whereas the "surname" has an asterisk after it.  Yet the
"noteref" and the "citationref" also have an asterisk after
them, which implies to me -- remember I am still guessing --
that a person's surname has the same type of optionality as
a noteref or a citationref.  So I have to admit that doesn't
seem right to me.  So am I misunderstanding or guessing
wrong -- or is the DTD is slightly incorrect, somehow?

We also see:

<!ATTLIST name
        alt       (0|1) #IMPLIED
        type      CDATA #IMPLIED
        priv      (0|1) #IMPLIED
        sort      CDATA #IMPLIED
        display   CDATA #IMPLIED
>

Indeed, it's possible to create a "name" without any of
those attributes, although you have to work at it a little,
erasing the default "Birth Name" type which the dialog
initially presents you.

Note also that the comment above those lines is just that, a
comment:

<!-- (Unknown|Also Know As|Birth Name|Married Name|Other Name) -->

There is evidently no formal definition of those things in
the DTD -- the way that one of the possible elements of a
"name" was defined as (daterange|datespan|dateval|datestr)?
with the pipes in between the different possible choices.

Note also a typo in that comment, as the string in gramps is
really "Also Known As" -- so there's another example that
the DTD isn't perfect.  In addition, the "Other Name" choice
is not presented to you as a possibility in the Person
dialog, despite being in the DTD's comment definition.

And then we have this group:

<!ELEMENT first      (#PCDATA)>
<!ELEMENT call       (#PCDATA)>
<!ELEMENT suffix     (#PCDATA)>
<!ELEMENT title      (#PCDATA)>
<!ELEMENT nick       (#PCDATA)>
<!ELEMENT familynick (#PCDATA)>
<!ELEMENT group      (#PCDATA)>
<!ELEMENT surname    (#PCDATA)>

where evidently there is no distinction made between any of
those.  A "first" is the same as a "surname" is the same as
a nickname.  So I'm guessing that the (#PCDATA) just
describes the contents of those possibilities, with CDATA
being different than PCDATA and maybe different yet with a
number-sign/hashmark in front of it.

And right underneath that last line we have:

<!ATTLIST surname
        prefix      CDATA #IMPLIED
        prim        (1|0) #IMPLIED
        derivation  CDATA #IMPLIED
        connector   CDATA #IMPLIED
>

And indeed the usual "surname" doesn't have any of those
attributes at all.

The important thing however, at least for me writing all
this down in this thread, is that I don't see any DTD
specification for the inter-relationship of the elements of
a "name".

That is, I don't see anything which says something like:

  If you have a non-null "first" you must also have at least
  a null "surname" too.

And I'd certainly imagine there should be some specification
saying something like:

  A non-null "group" requires a non-null "surname" too.

Or:

  A non-null "suffix" requires a non-null "surname" too.

Note that if you leave the given name and surname fields in
a Person dialog totally empty, and erase the name type
choice, you get this in the exported XML:

<name>
  <surname></surname>
</name>

That seems correct to me.

But it doesn't look to me like it follows the DTD.

So IMHO citing what the DTD currently says is an
insufficient rationale for arguing for or against what I am
proposing.

I think we should be arguing based on what is "the right
thing" to have gramps do -- and thus discussing what we each
feel "the right thing" is, and why.



If I look at the grampsxml.rng "RELAX NG schema" file, it
seems to be an alternate way to specify the gramps XML.  I
am guessing that they both attempt to describe the same
thing however, just in different ways.  Although again I am
just guessing, since I have never heard of "RELAX NG" before
(and don't have the time to go to relaxng.org and look into
it).

It seems to be easier to guess at however, since we see
things like:

      <optional><element name="people">
          <optional><attribute name="default"><text/></attribute></optional>
          <optional><attribute name="home">
              <data type="IDREF"/>
          </attribute></optional>
          <zeroOrMore><element name="person">
              <ref name="person-content"/>
          </element></zeroOrMore>
      </element></optional>

So that pretty clearly seems to say that both "default" and
"home" are an "optional" "attribute" of "people" whereas
"people" can have zero-or-more "person" "element"s.  So we
see that "optional" is not the same as "zeroOrMore" -- even
if the difference is defined off in relaxng.org somewhere.

And we see that "default" is <text/> whereas "home" is <data
type="IDREF"/> instead, and a "person" consists of
"person-content".  So far so good.

I won't paste in the whole "person-content" definition since
it is too long, but it starts off this way:

  <define name="person-content">
    <ref name="primary-object"/>
    <element name="gender"><choice>
        <value>M</value>
        <value>F</value>
        <value>U</value>
    </choice></element>
    <zeroOrMore><element name="name">
        <ref name="name-content"/>
    </element></zeroOrMore>

So we immediately notice a different between "gender" which
is an unqualified "element", and "name" which says
zero-or-more.  So I am guessing that means that a "gender"
field must always be present for a "person".

Furthermore, in this specification the gender is limited to
a "choice" of only three letters, not the more-general
specification the DTD has.  So the gramps Person dialog
agrees with this specification, even if not the DTD.  And
the DTD disagrees with this schema, formally at least.

Then later we see the name-content definition:

  <define name="name-content">
    <optional><attribute name="alt"><choice>
        <value>0</value>
        <value>1</value>
    </choice></attribute></optional>
    <optional><attribute name="priv">
        <ref name="priv-content"/>
    </attribute></optional>
    <optional><attribute name="type"><text/></attribute></optional>
    <optional><attribute name="sort"><text/></attribute></optional>
    <optional><attribute name="display"><text/></attribute></optional>
    <optional><element name="first"><text/></element></optional>
    <optional><element name="call"><text/></element></optional>
    <zeroOrMore><element name="surname">
        <ref name="surname-content"/>
    </element></zeroOrMore>
    <optional><element name="suffix"><text/></element></optional>
    <optional><element name="title"><text/></element></optional>
    <optional><element name="nick"><text/></element></optional>
    <optional><element name="familynick"><text/></element></optional>
    <optional><element name="group"><text/></element></optional>
    <optional><ref name="date-content"/></optional>
    <zeroOrMore><element name="noteref">
        <ref name="noteref-content"/>
    </element></zeroOrMore>
    <zeroOrMore><element name="citationref">
        <ref name="citationref-content"/>
    </element></zeroOrMore>
  </define>

Pretty clearly this says that name-content can have five
optional attributes, eight optional elements, and three
zero-or-more elements: surname, noteref, and citationref.

So that's consistent with the asterisk vs. question-mark in
the DTD's "name" specification:

<!ELEMENT name    (first?, call?, surname*, suffix?, title?, nick?,
familynick?, group?,
                  (daterange|datespan|dateval|datestr)?, noteref*,
citationref*)>

But as I said, gramps doesn't produce that at the moment,
since there is not a null noteref and null citationref field
for every "name" in a gramps XML.  Another difference
between the specifications and the gramps reality.

Then we see:

  <define name="surname-content">
    <text/>
    <optional><attribute name="prefix"><text/></attribute></optional>
    <optional><attribute name="prim"><choice>
    <value>1</value>
    <value>0</value>
    </choice></attribute></optional>
    <optional><attribute name="derivation"><text/></attribute></optional>
    <optional><attribute name="connector"><text/></attribute></optional>
  </define>

That seems to say that the content of a surname field
consists of <text/> and four optional attributes.  I am
therefore guessing that <text/> specifies a possibly-empty
string.

But it is not the content of the surname field that we are
discussing, as far as the possible patches of this thread
are concerned.

It is whether or not the field is required to be present.

That seems to me to be defined by the name-content
definition as zero-or-more as opposed to "optional".

So it seems to me that the "RELAX NG" specification for the
gramps XML requires a possibly-null "surname" field to be
always present in a "name" child sub-element, not optionally
totally absent.

I don't have the slightest idea whether gramps has to "obey"
the DTD specification or the "RELAX NG" specification,
somehow, or whether they are (somehow) only for our internal
use, for our internal guidance.

But it seems to me the two specifications are inconsistent.

And also that both of them are inconsistent with what gramps
does at the moment, in one way or another.

So it still seems to me that it is Nick's decision what to
do.

I just don't think either the DTD or the "RELAX NG"
specification can be used as a rationale.

------------------------------------------------------------------------------
Check out the vibrant tech community on one of the world's most
engaging tech sites, Slashdot.org! http://sdm.link/slashdot
_______________________________________________
Gramps-devel mailing list
[hidden email]
https://lists.sourceforge.net/lists/listinfo/gramps-devel
Reply | Threaded
Open this post in threaded view
|

Re: request for review, for 5.0.0

Stephen Adams


On Thu, May 24, 2018 at 4:20 PM, Paul Franklin <[hidden email]> wrote:
I am not familiar with DTD (Document Type Definition)

Nor am I but :
 
  | So I assume the question mark or asterisk means they are all
  | optional, in some way or another.  


The syntax looks to use regular expressions (and that fits with my understanding of the use of gramps) where ? is 0 or 1 exactly and * is 0 or more.

So for example you may have 0 or 1 first names but for surnames you may have 0 or more.

Your interpretation looks correct, but  hope that helps.

Steve

------------------------------------------------------------------------------
Check out the vibrant tech community on one of the world's most
engaging tech sites, Slashdot.org! http://sdm.link/slashdot
_______________________________________________
Gramps-devel mailing list
[hidden email]
https://lists.sourceforge.net/lists/listinfo/gramps-devel