Sort order of non-latin characters in Gramps

classic Classic list List threaded Threaded
30 messages Options
12
Reply | Threaded
Open this post in threaded view
|

Sort order of non-latin characters in Gramps

adrian.davey
Is there any way in Gramps to set the sorting of non-latin characters?

Or, if Gramps effectively treats non-latin characters for sorting as
somehow equivalent to latin ones, can it also be configured to do the
same in a search?

For example, I think I am correct in saying that the Danish alphabet
includes the 26 latin characters A-Z, plus Æ, Ø and Å, which sort after
Z, and in that sequence. And the Swedish alphabet includes the 26 latin
characters A-Z, plus  Å, Ä, and Ö, which again sort after Z.

But Gramps sorts names simply as if e.g. Å was an A, such that (for
example), instead of ÅKESSON being listed after Z, it is within the A
names (in my d/b, the successive family names after a name sort are
AITKEN, ÅKERLUND, ÅKERMAN, AKERS, ÅKESDOTTER, ÅKESSON, ALBERD, ALBERG).
Likewise Gramps sorts Ø as if it was O, so JØRGENSEN is not at the end
of the J... list, but within the JOR... listings (JORDAN, JØRGENSEN,
JORY). BÆK will be after BADGER and before BAILEY (rather than following
BYWORTH and preceding CABLE). And so on.

This behaviour of treating non-latin characters as the nearest
equivalent latin one would not necessarily be a problem if it was
consistent. Entering the string ÅKESSON with the default "Name contains"
in the Find at top of the people view does find all instances of that
name. But entering AKESSON does not find any. (While I am now
surprisingly proficient at entering non-latin characters in unicode,
more or less with my eyes shut, I also often use the trick of truncating
the string so it looks for e.g. "KESSON", which often provides a quick
workaround.)

The Gramps support for unicode is extremely good, so I am a bit puzzled
how a unicode OOC5 or 00E5 (the lower case and upper case instances of
Å) appears to be treated the same as 0041/0061 (A/a) for sorting anyway.
There must be some complicated wizardry hidden somewhere to achieve that!

I imagine there is an extremely complicated can of worms here.

Gramps is not likely to know which language (and therefore alphabet and
sort-order) a particular unicode character may belong to. It is
presumably the case that the same name containing the character Å
strictly-speaking should be sorted differently according to whether it
is, say, Danish rather than Swedish.

I am not at all expert in any of the (many) languages that use more than
the 26-character latin alphabet. Over the last year or so I have had a
bit of a baptism by fire in Danish & Swedish. Since my first use of
Gramps I have always needed to use diacritics. because my own ancestry
includes a line of Norman French names, but courtesy of Scandinavia I
now have a substantial population of names that use non-latin characters.

I also now need to generate reports that meet the expectations of both
the speakers of English and, say, Swedish.

So now when I distribute a dynamic web report to family members, I have
the complication that I have to explain to the English-speakers that
they will not find a name like ÅKESSON unless they paste the correct
character into the search boxes (or use the truncation trick), but I
also have to advise the Swedish-speakers that they will not find names
like ÅKESSON or GÖRANSSON where they expect them in the indexes, but in
an "English" version of their own language order!

Is it possible to set the sort order?

Is sort order a consequence of o/s locale setting (English Australia in
my case, and which for a host of reasons I will be extremely wary to
change)? If a DWR is generated on a machine of one locale, does index
sort-order change if the report is used on a machine set to a different
locale?

Is it possible to tweak searching so it will find e. g. LAINÉ if the
user just enters LAINE? Or return JØRGENSEN when they enter JORGENSEN?

I would be interested to hear the views of people who have more
experience of these issues!

--
Adrian Davey | GrampsAIO64-5.0.1-1 | W10P



--
Gramps-users mailing list
[hidden email]
https://lists.sourceforge.net/lists/listinfo/gramps-users
https://gramps-project.org
Reply | Threaded
Open this post in threaded view
|

Re: Sort order of non-latin characters in Gramps

GRAMPS - User mailing list
Adrian,

This is a thorny issue. And Swedish/Danish difficulties have been specifically been topics of discussion all the way back in the version 3 documentation.

You probably want to explore the Alternative Name options.  They allow you to override sorting, grouping & make to make the Latin variant invisible yet searchable ... all independently of the Display name.


There have been recent discussions about the Alternative Name feature that you would've missed as a new subscriber to this maillist.
A couple of these archived threads are linked in the 'See also' section at the bottom of the following page:
(This is a good way to be introduced to exploring the archives too.)

-Brian

On Sun, Dec 1, 2019 at 18:48, Adrian Davey
Is there any way in Gramps to set the sorting of non-latin characters?

Or, if Gramps effectively treats non-latin characters for sorting as
somehow equivalent to latin ones, can it also be configured to do the
same in a search?

For example, I think I am correct in saying that the Danish alphabet
includes the 26 latin characters A-Z, plus Æ, Ø and Å, which sort after
Z, and in that sequence. And the Swedish alphabet includes the 26 latin
characters A-Z, plus  Å, Ä, and Ö, which again sort after Z.

But Gramps sorts names simply as if e.g. Å was an A, such that (for
example), instead of ÅKESSON being listed after Z, it is within the A
names (in my d/b, the successive family names after a name sort are
AITKEN, ÅKERLUND, ÅKERMAN, AKERS, ÅKESDOTTER, ÅKESSON, ALBERD, ALBERG).
Likewise Gramps sorts Ø as if it was O, so JØRGENSEN is not at the end
of the J... list, but within the JOR... listings (JORDAN, JØRGENSEN,
JORY). BÆK will be after BADGER and before BAILEY (rather than following
BYWORTH and preceding CABLE). And so on.

This behaviour of treating non-latin characters as the nearest
equivalent latin one would not necessarily be a problem if it was
consistent. Entering the string ÅKESSON with the default "Name contains"
in the Find at top of the people view does find all instances of that
name. But entering AKESSON does not find any. (While I am now
surprisingly proficient at entering non-latin characters in unicode,
more or less with my eyes shut, I also often use the trick of truncating
the string so it looks for e.g. "KESSON", which often provides a quick
workaround.)

The Gramps support for unicode is extremely good, so I am a bit puzzled
how a unicode OOC5 or 00E5 (the lower case and upper case instances of
Å) appears to be treated the same as 0041/0061 (A/a) for sorting anyway.
There must be some complicated wizardry hidden somewhere to achieve that!

I imagine there is an extremely complicated can of worms here.

Gramps is not likely to know which language (and therefore alphabet and
sort-order) a particular unicode character may belong to. It is
presumably the case that the same name containing the character Å
strictly-speaking should be sorted differently according to whether it
is, say, Danish rather than Swedish.

I am not at all expert in any of the (many) languages that use more than
the 26-character latin alphabet. Over the last year or so I have had a
bit of a baptism by fire in Danish & Swedish. Since my first use of
Gramps I have always needed to use diacritics. because my own ancestry
includes a line of Norman French names, but courtesy of Scandinavia I
now have a substantial population of names that use non-latin characters.

I also now need to generate reports that meet the expectations of both
the speakers of English and, say, Swedish.

So now when I distribute a dynamic web report to family members, I have
the complication that I have to explain to the English-speakers that
they will not find a name like ÅKESSON unless they paste the correct
character into the search boxes (or use the truncation trick), but I
also have to advise the Swedish-speakers that they will not find names
like ÅKESSON or GÖRANSSON where they expect them in the indexes, but in
an "English" version of their own language order!

Is it possible to set the sort order?

Is sort order a consequence of o/s locale setting (English Australia in
my case, and which for a host of reasons I will be extremely wary to
change)? If a DWR is generated on a machine of one locale, does index
sort-order change if the report is used on a machine set to a different
locale?

Is it possible to tweak searching so it will find e. g. LAINÉ if the
user just enters LAINE? Or return JØRGENSEN when they enter JORGENSEN?

I would be interested to hear the views of people who have more
experience of these issues!

--
Adrian Davey | GrampsAIO64-5.0.1-1 | W10P



--
Gramps-users mailing list


--
Gramps-users mailing list
[hidden email]
https://lists.sourceforge.net/lists/listinfo/gramps-users
https://gramps-project.org
Reply | Threaded
Open this post in threaded view
|

Re: Sort order of non-latin characters in Gramps

StoltHD
This apply to most Unicode doublebyte characters, Norwegian has the same problem...

But it sort correctly when using Norwegian Language as computer input language and Norwegian in Gramps... But If I switch to English on a Computer with Norwegian Windows, it sort the wrong way...

So this is not a OS/Windows problem, it is a Gramps/Python Problem when English is used as Language in Gramps...

Jaran

man. 2. des. 2019 kl. 10:58 skrev StoltHD <[hidden email]>:
This apply to most Unicode doublebyte characters, Norwegian has the same problem...

man. 2. des. 2019 kl. 04:48 skrev Emyoulation--- via Gramps-users <[hidden email]>:
Adrian,

This is a thorny issue. And Swedish/Danish difficulties have been specifically been topics of discussion all the way back in the version 3 documentation.

You probably want to explore the Alternative Name options.  They allow you to override sorting, grouping & make to make the Latin variant invisible yet searchable ... all independently of the Display name.


There have been recent discussions about the Alternative Name feature that you would've missed as a new subscriber to this maillist.
A couple of these archived threads are linked in the 'See also' section at the bottom of the following page:
(This is a good way to be introduced to exploring the archives too.)

-Brian

On Sun, Dec 1, 2019 at 18:48, Adrian Davey
Is there any way in Gramps to set the sorting of non-latin characters?

Or, if Gramps effectively treats non-latin characters for sorting as
somehow equivalent to latin ones, can it also be configured to do the
same in a search?

For example, I think I am correct in saying that the Danish alphabet
includes the 26 latin characters A-Z, plus Æ, Ø and Å, which sort after
Z, and in that sequence. And the Swedish alphabet includes the 26 latin
characters A-Z, plus  Å, Ä, and Ö, which again sort after Z.

But Gramps sorts names simply as if e.g. Å was an A, such that (for
example), instead of ÅKESSON being listed after Z, it is within the A
names (in my d/b, the successive family names after a name sort are
AITKEN, ÅKERLUND, ÅKERMAN, AKERS, ÅKESDOTTER, ÅKESSON, ALBERD, ALBERG).
Likewise Gramps sorts Ø as if it was O, so JØRGENSEN is not at the end
of the J... list, but within the JOR... listings (JORDAN, JØRGENSEN,
JORY). BÆK will be after BADGER and before BAILEY (rather than following
BYWORTH and preceding CABLE). And so on.

This behaviour of treating non-latin characters as the nearest
equivalent latin one would not necessarily be a problem if it was
consistent. Entering the string ÅKESSON with the default "Name contains"
in the Find at top of the people view does find all instances of that
name. But entering AKESSON does not find any. (While I am now
surprisingly proficient at entering non-latin characters in unicode,
more or less with my eyes shut, I also often use the trick of truncating
the string so it looks for e.g. "KESSON", which often provides a quick
workaround.)

The Gramps support for unicode is extremely good, so I am a bit puzzled
how a unicode OOC5 or 00E5 (the lower case and upper case instances of
Å) appears to be treated the same as 0041/0061 (A/a) for sorting anyway.
There must be some complicated wizardry hidden somewhere to achieve that!

I imagine there is an extremely complicated can of worms here.

Gramps is not likely to know which language (and therefore alphabet and
sort-order) a particular unicode character may belong to. It is
presumably the case that the same name containing the character Å
strictly-speaking should be sorted differently according to whether it
is, say, Danish rather than Swedish.

I am not at all expert in any of the (many) languages that use more than
the 26-character latin alphabet. Over the last year or so I have had a
bit of a baptism by fire in Danish & Swedish. Since my first use of
Gramps I have always needed to use diacritics. because my own ancestry
includes a line of Norman French names, but courtesy of Scandinavia I
now have a substantial population of names that use non-latin characters.

I also now need to generate reports that meet the expectations of both
the speakers of English and, say, Swedish.

So now when I distribute a dynamic web report to family members, I have
the complication that I have to explain to the English-speakers that
they will not find a name like ÅKESSON unless they paste the correct
character into the search boxes (or use the truncation trick), but I
also have to advise the Swedish-speakers that they will not find names
like ÅKESSON or GÖRANSSON where they expect them in the indexes, but in
an "English" version of their own language order!

Is it possible to set the sort order?

Is sort order a consequence of o/s locale setting (English Australia in
my case, and which for a host of reasons I will be extremely wary to
change)? If a DWR is generated on a machine of one locale, does index
sort-order change if the report is used on a machine set to a different
locale?

Is it possible to tweak searching so it will find e. g. LAINÉ if the
user just enters LAINE? Or return JØRGENSEN when they enter JORGENSEN?

I would be interested to hear the views of people who have more
experience of these issues!

--
Adrian Davey | GrampsAIO64-5.0.1-1 | W10P



--
Gramps-users mailing list
--
Gramps-users mailing list
[hidden email]
https://lists.sourceforge.net/lists/listinfo/gramps-users
https://gramps-project.org


--
Gramps-users mailing list
[hidden email]
https://lists.sourceforge.net/lists/listinfo/gramps-users
https://gramps-project.org
Reply | Threaded
Open this post in threaded view
|

Re: Sort order of non-latin characters in Gramps

StoltHD
Clearify... I switch language in Gramps not the input/system language of my computer...

Jaran

man. 2. des. 2019 kl. 11:26 skrev StoltHD <[hidden email]>:
This apply to most Unicode doublebyte characters, Norwegian has the same problem...

But it sort correctly when using Norwegian Language as computer input language and Norwegian in Gramps... But If I switch to English on a Computer with Norwegian Windows, it sort the wrong way...

So this is not a OS/Windows problem, it is a Gramps/Python Problem when English is used as Language in Gramps...

Jaran

man. 2. des. 2019 kl. 10:58 skrev StoltHD <[hidden email]>:
This apply to most Unicode doublebyte characters, Norwegian has the same problem...

man. 2. des. 2019 kl. 04:48 skrev Emyoulation--- via Gramps-users <[hidden email]>:
Adrian,

This is a thorny issue. And Swedish/Danish difficulties have been specifically been topics of discussion all the way back in the version 3 documentation.

You probably want to explore the Alternative Name options.  They allow you to override sorting, grouping & make to make the Latin variant invisible yet searchable ... all independently of the Display name.


There have been recent discussions about the Alternative Name feature that you would've missed as a new subscriber to this maillist.
A couple of these archived threads are linked in the 'See also' section at the bottom of the following page:
(This is a good way to be introduced to exploring the archives too.)

-Brian

On Sun, Dec 1, 2019 at 18:48, Adrian Davey
Is there any way in Gramps to set the sorting of non-latin characters?

Or, if Gramps effectively treats non-latin characters for sorting as
somehow equivalent to latin ones, can it also be configured to do the
same in a search?

For example, I think I am correct in saying that the Danish alphabet
includes the 26 latin characters A-Z, plus Æ, Ø and Å, which sort after
Z, and in that sequence. And the Swedish alphabet includes the 26 latin
characters A-Z, plus  Å, Ä, and Ö, which again sort after Z.

But Gramps sorts names simply as if e.g. Å was an A, such that (for
example), instead of ÅKESSON being listed after Z, it is within the A
names (in my d/b, the successive family names after a name sort are
AITKEN, ÅKERLUND, ÅKERMAN, AKERS, ÅKESDOTTER, ÅKESSON, ALBERD, ALBERG).
Likewise Gramps sorts Ø as if it was O, so JØRGENSEN is not at the end
of the J... list, but within the JOR... listings (JORDAN, JØRGENSEN,
JORY). BÆK will be after BADGER and before BAILEY (rather than following
BYWORTH and preceding CABLE). And so on.

This behaviour of treating non-latin characters as the nearest
equivalent latin one would not necessarily be a problem if it was
consistent. Entering the string ÅKESSON with the default "Name contains"
in the Find at top of the people view does find all instances of that
name. But entering AKESSON does not find any. (While I am now
surprisingly proficient at entering non-latin characters in unicode,
more or less with my eyes shut, I also often use the trick of truncating
the string so it looks for e.g. "KESSON", which often provides a quick
workaround.)

The Gramps support for unicode is extremely good, so I am a bit puzzled
how a unicode OOC5 or 00E5 (the lower case and upper case instances of
Å) appears to be treated the same as 0041/0061 (A/a) for sorting anyway.
There must be some complicated wizardry hidden somewhere to achieve that!

I imagine there is an extremely complicated can of worms here.

Gramps is not likely to know which language (and therefore alphabet and
sort-order) a particular unicode character may belong to. It is
presumably the case that the same name containing the character Å
strictly-speaking should be sorted differently according to whether it
is, say, Danish rather than Swedish.

I am not at all expert in any of the (many) languages that use more than
the 26-character latin alphabet. Over the last year or so I have had a
bit of a baptism by fire in Danish & Swedish. Since my first use of
Gramps I have always needed to use diacritics. because my own ancestry
includes a line of Norman French names, but courtesy of Scandinavia I
now have a substantial population of names that use non-latin characters.

I also now need to generate reports that meet the expectations of both
the speakers of English and, say, Swedish.

So now when I distribute a dynamic web report to family members, I have
the complication that I have to explain to the English-speakers that
they will not find a name like ÅKESSON unless they paste the correct
character into the search boxes (or use the truncation trick), but I
also have to advise the Swedish-speakers that they will not find names
like ÅKESSON or GÖRANSSON where they expect them in the indexes, but in
an "English" version of their own language order!

Is it possible to set the sort order?

Is sort order a consequence of o/s locale setting (English Australia in
my case, and which for a host of reasons I will be extremely wary to
change)? If a DWR is generated on a machine of one locale, does index
sort-order change if the report is used on a machine set to a different
locale?

Is it possible to tweak searching so it will find e. g. LAINÉ if the
user just enters LAINE? Or return JØRGENSEN when they enter JORGENSEN?

I would be interested to hear the views of people who have more
experience of these issues!

--
Adrian Davey | GrampsAIO64-5.0.1-1 | W10P



--
Gramps-users mailing list
--
Gramps-users mailing list
[hidden email]
https://lists.sourceforge.net/lists/listinfo/gramps-users
https://gramps-project.org


--
Gramps-users mailing list
[hidden email]
https://lists.sourceforge.net/lists/listinfo/gramps-users
https://gramps-project.org
Reply | Threaded
Open this post in threaded view
|

Re: Sort order of non-latin characters in Gramps

Erik P. Olsen
In reply to this post by StoltHD
Same with Danish.
 
--
Erik


On 2019-12-02 at 11:26:40 StoltHD wrote:

>  This apply to most Unicode doublebyte characters, Norwegian has the same
> problem...
>
> But it sort correctly when using Norwegian Language as computer input
> language and Norwegian in Gramps... But If I switch to English on a
> Computer with Norwegian Windows, it sort the wrong way...
>
> So this is not a OS/Windows problem, it is a Gramps/Python Problem when
> English is used as Language in Gramps...
>
> Jaran
>
> man. 2. des. 2019 kl. 10:58 skrev StoltHD <[hidden email]>:
>
> > This apply to most Unicode doublebyte characters, Norwegian has the same
> > problem...
> >
> > man. 2. des. 2019 kl. 04:48 skrev Emyoulation--- via Gramps-users <
> > [hidden email]>:
> >
> >> Adrian,
> >>
> >> This is a thorny issue. And Swedish/Danish difficulties have been
> >> specifically been topics of discussion all the way back in the version 3
> >> documentation.
> >>
> >> You probably want to explore the Alternative Name options.  They allow
> >> you to override sorting, grouping & make to make the Latin variant
> >> invisible yet searchable ... all independently of the Display name.
> >>
> >>
> >> https://www.gramps-project.org/wiki/index.php/Gramps_5.0_Wiki_Manual_-_Entering_and_editing_data:_detailed_-_part_3
> >>
> >> There have been recent discussions about the Alternative Name feature
> >> that you would've missed as a new subscriber to this maillist.
> >> A couple of these archived threads are linked in the 'See also' section
> >> at the bottom of the following page:
> >> https://gramps-project.org/wiki/index.php/Grouping_Surnames
> >> (This is a good way to be introduced to exploring the archives too.)
> >>
> >> -Brian
> >>
> >> On Sun, Dec 1, 2019 at 18:48, Adrian Davey
> >> <[hidden email]> wrote:
> >> Is there any way in Gramps to set the sorting of non-latin characters?
> >>
> >> Or, if Gramps effectively treats non-latin characters for sorting as
> >> somehow equivalent to latin ones, can it also be configured to do the
> >> same in a search?
> >>
> >> For example, I think I am correct in saying that the Danish alphabet
> >> includes the 26 latin characters A-Z, plus Æ, Ø and Å, which sort after
> >> Z, and in that sequence. And the Swedish alphabet includes the 26 latin
> >> characters A-Z, plus  Å, Ä, and Ö, which again sort after Z.
> >>
> >> But Gramps sorts names simply as if e.g. Å was an A, such that (for
> >> example), instead of ÅKESSON being listed after Z, it is within the A
> >> names (in my d/b, the successive family names after a name sort are
> >> AITKEN, ÅKERLUND, ÅKERMAN, AKERS, ÅKESDOTTER, ÅKESSON, ALBERD, ALBERG).
> >> Likewise Gramps sorts Ø as if it was O, so JØRGENSEN is not at the end
> >> of the J... list, but within the JOR... listings (JORDAN, JØRGENSEN,
> >> JORY). BÆK will be after BADGER and before BAILEY (rather than following
> >> BYWORTH and preceding CABLE). And so on.
> >>
> >> This behaviour of treating non-latin characters as the nearest
> >> equivalent latin one would not necessarily be a problem if it was
> >> consistent. Entering the string ÅKESSON with the default "Name contains"
> >> in the Find at top of the people view does find all instances of that
> >> name. But entering AKESSON does not find any. (While I am now
> >> surprisingly proficient at entering non-latin characters in unicode,
> >> more or less with my eyes shut, I also often use the trick of truncating
> >> the string so it looks for e.g. "KESSON", which often provides a quick
> >> workaround.)
> >>
> >> The Gramps support for unicode is extremely good, so I am a bit puzzled
> >> how a unicode OOC5 or 00E5 (the lower case and upper case instances of
> >> Å) appears to be treated the same as 0041/0061 (A/a) for sorting anyway.
> >> There must be some complicated wizardry hidden somewhere to achieve that!
> >>
> >> I imagine there is an extremely complicated can of worms here.
> >>
> >> Gramps is not likely to know which language (and therefore alphabet and
> >> sort-order) a particular unicode character may belong to. It is
> >> presumably the case that the same name containing the character Å
> >> strictly-speaking should be sorted differently according to whether it
> >> is, say, Danish rather than Swedish.
> >>
> >> I am not at all expert in any of the (many) languages that use more than
> >> the 26-character latin alphabet. Over the last year or so I have had a
> >> bit of a baptism by fire in Danish & Swedish. Since my first use of
> >> Gramps I have always needed to use diacritics. because my own ancestry
> >> includes a line of Norman French names, but courtesy of Scandinavia I
> >> now have a substantial population of names that use non-latin characters.
> >>
> >> I also now need to generate reports that meet the expectations of both
> >> the speakers of English and, say, Swedish.
> >>
> >> So now when I distribute a dynamic web report to family members, I have
> >> the complication that I have to explain to the English-speakers that
> >> they will not find a name like ÅKESSON unless they paste the correct
> >> character into the search boxes (or use the truncation trick), but I
> >> also have to advise the Swedish-speakers that they will not find names
> >> like ÅKESSON or GÖRANSSON where they expect them in the indexes, but in
> >> an "English" version of their own language order!
> >>
> >> Is it possible to set the sort order?
> >>
> >> Is sort order a consequence of o/s locale setting (English Australia in
> >> my case, and which for a host of reasons I will be extremely wary to
> >> change)? If a DWR is generated on a machine of one locale, does index
> >> sort-order change if the report is used on a machine set to a different
> >> locale?
> >>
> >> Is it possible to tweak searching so it will find e. g. LAINÉ if the
> >> user just enters LAINE? Or return JØRGENSEN when they enter JORGENSEN?
> >>
> >> I would be interested to hear the views of people who have more
> >> experience of these issues!
> >>
> >> --
> >> Adrian Davey | GrampsAIO64-5.0.1-1 | W10P
> >>
> >>
> >>
> >> --
> >> Gramps-users mailing list
> >> [hidden email]
> >> https://lists.sourceforge.net/lists/listinfo/gramps-users
> >> https://gramps-project.org
> >>
> >> --
> >> Gramps-users mailing list
> >> [hidden email]
> >> https://lists.sourceforge.net/lists/listinfo/gramps-users
> >> https://gramps-project.org
> >
> >

 


--
Erik P. Olsen - Copenhagen, Denmark
Fedora 30/64 bit Linux xfce Claws-Mail POP3 Gramps 5.1.1


--
Gramps-users mailing list
[hidden email]
https://lists.sourceforge.net/lists/listinfo/gramps-users
https://gramps-project.org
Reply | Threaded
Open this post in threaded view
|

Re: Sort order of non-latin characters in Gramps

GRAMPS - User mailing list

There was an interesting discussion about option regarding the contrariness of how different languages sort Unicode here:
https://stackoverflow.com/questions/16538605/utf8-collation-difference-between-unicode-and-danish

One point was about some languages even having MULTIPLE sort rules... depending on the application. (Their example was where German dictionaries & phone directories sort the umlaut accenting oppositely.)

But equally interesting is a discussion mentioning SQLite's approach to Unicode that provides a BYOS (Bring Your Own Sorter) policy.
See:

Since Gramps is Open Source, maybe there's an opportunity for someone to add a dynamic sorting policy options to the Internationalization features of our favorite program.

-Brian

On Mon, Dec 2, 2019 at 5:28, Erik P. Olsen
Same with Danish.

--
Erik


On 2019-12-02 at 11:26:40 StoltHD wrote:

>  This apply to most Unicode doublebyte characters, Norwegian has the same
> problem...
>
> But it sort correctly when using Norwegian Language as computer input
> language and Norwegian in Gramps... But If I switch to English on a
> Computer with Norwegian Windows, it sort the wrong way...
>
> So this is not a OS/Windows problem, it is a Gramps/Python Problem when
> English is used as Language in Gramps...
>
> Jaran
>
> man. 2. des. 2019 kl. 10:58 skrev StoltHD <[hidden email]>:
>
> > This apply to most Unicode doublebyte characters, Norwegian has the same
> > problem...
> >
> > man. 2. des. 2019 kl. 04:48 skrev Emyoulation--- via Gramps-users <
> > [hidden email]>:
> >
> >> Adrian,
> >>
> >> This is a thorny issue. And Swedish/Danish difficulties have been
> >> specifically been topics of discussion all the way back in the version 3
> >> documentation.
> >>
> >> You probably want to explore the Alternative Name options.  They allow
> >> you to override sorting, grouping & make to make the Latin variant
> >> invisible yet searchable ... all independently of the Display name.
> >>
> >>
> >> https://www.gramps-project.org/wiki/index.php/Gramps_5.0_Wiki_Manual_-_Entering_and_editing_data:_detailed_-_part_3
> >>
> >> There have been recent discussions about the Alternative Name feature
> >> that you would've missed as a new subscriber to this maillist.
> >> A couple of these archived threads are linked in the 'See also' section
> >> at the bottom of the following page:
> >> https://gramps-project.org/wiki/index.php/Grouping_Surnames
> >> (This is a good way to be introduced to exploring the archives too.)
> >>
> >> -Brian
> >>
> >> On Sun, Dec 1, 2019 at 18:48, Adrian Davey
> >> <[hidden email]> wrote:
> >> Is there any way in Gramps to set the sorting of non-latin characters?
> >>
> >> Or, if Gramps effectively treats non-latin characters for sorting as
> >> somehow equivalent to latin ones, can it also be configured to do the
> >> same in a search?
> >>
> >> For example, I think I am correct in saying that the Danish alphabet
> >> includes the 26 latin characters A-Z, plus Æ, Ø and Å, which sort after
> >> Z, and in that sequence. And the Swedish alphabet includes the 26 latin
> >> characters A-Z, plus  Å, Ä, and Ö, which again sort after Z.
> >>
> >> But Gramps sorts names simply as if e.g. Å was an A, such that (for
> >> example), instead of ÅKESSON being listed after Z, it is within the A
> >> names (in my d/b, the successive family names after a name sort are
> >> AITKEN, ÅKERLUND, ÅKERMAN, AKERS, ÅKESDOTTER, ÅKESSON, ALBERD, ALBERG).
> >> Likewise Gramps sorts Ø as if it was O, so JØRGENSEN is not at the end
> >> of the J... list, but within the JOR... listings (JORDAN, JØRGENSEN,
> >> JORY). BÆK will be after BADGER and before BAILEY (rather than following
> >> BYWORTH and preceding CABLE). And so on.
> >>
> >> This behaviour of treating non-latin characters as the nearest
> >> equivalent latin one would not necessarily be a problem if it was
> >> consistent. Entering the string ÅKESSON with the default "Name contains"
> >> in the Find at top of the people view does find all instances of that
> >> name. But entering AKESSON does not find any. (While I am now
> >> surprisingly proficient at entering non-latin characters in unicode,
> >> more or less with my eyes shut, I also often use the trick of truncating
> >> the string so it looks for e.g. "KESSON", which often provides a quick
> >> workaround.)
> >>
> >> The Gramps support for unicode is extremely good, so I am a bit puzzled
> >> how a unicode OOC5 or 00E5 (the lower case and upper case instances of
> >> Å) appears to be treated the same as 0041/0061 (A/a) for sorting anyway.
> >> There must be some complicated wizardry hidden somewhere to achieve that!
> >>
> >> I imagine there is an extremely complicated can of worms here.
> >>
> >> Gramps is not likely to know which language (and therefore alphabet and
> >> sort-order) a particular unicode character may belong to. It is
> >> presumably the case that the same name containing the character Å
> >> strictly-speaking should be sorted differently according to whether it
> >> is, say, Danish rather than Swedish.
> >>
> >> I am not at all expert in any of the (many) languages that use more than
> >> the 26-character latin alphabet. Over the last year or so I have had a
> >> bit of a baptism by fire in Danish & Swedish. Since my first use of
> >> Gramps I have always needed to use diacritics. because my own ancestry
> >> includes a line of Norman French names, but courtesy of Scandinavia I
> >> now have a substantial population of names that use non-latin characters.
> >>
> >> I also now need to generate reports that meet the expectations of both
> >> the speakers of English and, say, Swedish.
> >>
> >> So now when I distribute a dynamic web report to family members, I have
> >> the complication that I have to explain to the English-speakers that
> >> they will not find a name like ÅKESSON unless they paste the correct
> >> character into the search boxes (or use the truncation trick), but I
> >> also have to advise the Swedish-speakers that they will not find names
> >> like ÅKESSON or GÖRANSSON where they expect them in the indexes, but in
> >> an "English" version of their own language order!
> >>
> >> Is it possible to set the sort order?
> >>
> >> Is sort order a consequence of o/s locale setting (English Australia in
> >> my case, and which for a host of reasons I will be extremely wary to
> >> change)? If a DWR is generated on a machine of one locale, does index
> >> sort-order change if the report is used on a machine set to a different
> >> locale?
> >>
> >> Is it possible to tweak searching so it will find e. g. LAINÉ if the
> >> user just enters LAINE? Or return JØRGENSEN when they enter JORGENSEN?
> >>
> >> I would be interested to hear the views of people who have more
> >> experience of these issues!
> >>
> >> --
> >> Adrian Davey | GrampsAIO64-5.0.1-1 | W10P
> >>
> >>
> >>
> >> --
> >> Gramps-users mailing list
> >> [hidden email]
> >> https://lists.sourceforge.net/lists/listinfo/gramps-users
> >> https://gramps-project.org
> >>
> >> --
> >> Gramps-users mailing list
> >> [hidden email]
> >> https://lists.sourceforge.net/lists/listinfo/gramps-users
> >> https://gramps-project.org
> >
> >




--
Erik P. Olsen - Copenhagen, Denmark
Fedora 30/64 bit Linux xfce Claws-Mail POP3 Gramps 5.1.1


--
Gramps-users mailing list
[hidden email]
https://lists.sourceforge.net/lists/listinfo/gramps-users
https://gramps-project.org
Reply | Threaded
Open this post in threaded view
|

Re: Sort order of non-latin characters in Gramps

StoltHD
The problem is that in i.e. Swedish and German the character "Ö=ö" is sorted as larger than Z in Swedish and lesser than in German... This is just one example... this document at unicode.org, describe mostly everything about the standard...

https://unicode.org/reports/tr10/

man. 2. des. 2019 kl. 13:18 skrev Emyoulation--- via Gramps-users <[hidden email]>:

There was an interesting discussion about option regarding the contrariness of how different languages sort Unicode here:
https://stackoverflow.com/questions/16538605/utf8-collation-difference-between-unicode-and-danish

One point was about some languages even having MULTIPLE sort rules... depending on the application. (Their example was where German dictionaries & phone directories sort the umlaut accenting oppositely.)

But equally interesting is a discussion mentioning SQLite's approach to Unicode that provides a BYOS (Bring Your Own Sorter) policy.
See:

Since Gramps is Open Source, maybe there's an opportunity for someone to add a dynamic sorting policy options to the Internationalization features of our favorite program.

-Brian

On Mon, Dec 2, 2019 at 5:28, Erik P. Olsen
Same with Danish.

--
Erik


On 2019-12-02 at 11:26:40 StoltHD wrote:

>  This apply to most Unicode doublebyte characters, Norwegian has the same
> problem...
>
> But it sort correctly when using Norwegian Language as computer input
> language and Norwegian in Gramps... But If I switch to English on a
> Computer with Norwegian Windows, it sort the wrong way...
>
> So this is not a OS/Windows problem, it is a Gramps/Python Problem when
> English is used as Language in Gramps...
>
> Jaran
>
> man. 2. des. 2019 kl. 10:58 skrev StoltHD <[hidden email]>:
>
> > This apply to most Unicode doublebyte characters, Norwegian has the same
> > problem...
> >
> > man. 2. des. 2019 kl. 04:48 skrev Emyoulation--- via Gramps-users <
> > [hidden email]>:
> >
> >> Adrian,
> >>
> >> This is a thorny issue. And Swedish/Danish difficulties have been
> >> specifically been topics of discussion all the way back in the version 3
> >> documentation.
> >>
> >> You probably want to explore the Alternative Name options.  They allow
> >> you to override sorting, grouping & make to make the Latin variant
> >> invisible yet searchable ... all independently of the Display name.
> >>
> >>
> >> https://www.gramps-project.org/wiki/index.php/Gramps_5.0_Wiki_Manual_-_Entering_and_editing_data:_detailed_-_part_3
> >>
> >> There have been recent discussions about the Alternative Name feature
> >> that you would've missed as a new subscriber to this maillist.
> >> A couple of these archived threads are linked in the 'See also' section
> >> at the bottom of the following page:
> >> https://gramps-project.org/wiki/index.php/Grouping_Surnames
> >> (This is a good way to be introduced to exploring the archives too.)
> >>
> >> -Brian
> >>
> >> On Sun, Dec 1, 2019 at 18:48, Adrian Davey
> >> <[hidden email]> wrote:
> >> Is there any way in Gramps to set the sorting of non-latin characters?
> >>
> >> Or, if Gramps effectively treats non-latin characters for sorting as
> >> somehow equivalent to latin ones, can it also be configured to do the
> >> same in a search?
> >>
> >> For example, I think I am correct in saying that the Danish alphabet
> >> includes the 26 latin characters A-Z, plus Æ, Ø and Å, which sort after
> >> Z, and in that sequence. And the Swedish alphabet includes the 26 latin
> >> characters A-Z, plus  Å, Ä, and Ö, which again sort after Z.
> >>
> >> But Gramps sorts names simply as if e.g. Å was an A, such that (for
> >> example), instead of ÅKESSON being listed after Z, it is within the A
> >> names (in my d/b, the successive family names after a name sort are
> >> AITKEN, ÅKERLUND, ÅKERMAN, AKERS, ÅKESDOTTER, ÅKESSON, ALBERD, ALBERG).
> >> Likewise Gramps sorts Ø as if it was O, so JØRGENSEN is not at the end
> >> of the J... list, but within the JOR... listings (JORDAN, JØRGENSEN,
> >> JORY). BÆK will be after BADGER and before BAILEY (rather than following
> >> BYWORTH and preceding CABLE). And so on.
> >>
> >> This behaviour of treating non-latin characters as the nearest
> >> equivalent latin one would not necessarily be a problem if it was
> >> consistent. Entering the string ÅKESSON with the default "Name contains"
> >> in the Find at top of the people view does find all instances of that
> >> name. But entering AKESSON does not find any. (While I am now
> >> surprisingly proficient at entering non-latin characters in unicode,
> >> more or less with my eyes shut, I also often use the trick of truncating
> >> the string so it looks for e.g. "KESSON", which often provides a quick
> >> workaround.)
> >>
> >> The Gramps support for unicode is extremely good, so I am a bit puzzled
> >> how a unicode OOC5 or 00E5 (the lower case and upper case instances of
> >> Å) appears to be treated the same as 0041/0061 (A/a) for sorting anyway.
> >> There must be some complicated wizardry hidden somewhere to achieve that!
> >>
> >> I imagine there is an extremely complicated can of worms here.
> >>
> >> Gramps is not likely to know which language (and therefore alphabet and
> >> sort-order) a particular unicode character may belong to. It is
> >> presumably the case that the same name containing the character Å
> >> strictly-speaking should be sorted differently according to whether it
> >> is, say, Danish rather than Swedish.
> >>
> >> I am not at all expert in any of the (many) languages that use more than
> >> the 26-character latin alphabet. Over the last year or so I have had a
> >> bit of a baptism by fire in Danish & Swedish. Since my first use of
> >> Gramps I have always needed to use diacritics. because my own ancestry
> >> includes a line of Norman French names, but courtesy of Scandinavia I
> >> now have a substantial population of names that use non-latin characters.
> >>
> >> I also now need to generate reports that meet the expectations of both
> >> the speakers of English and, say, Swedish.
> >>
> >> So now when I distribute a dynamic web report to family members, I have
> >> the complication that I have to explain to the English-speakers that
> >> they will not find a name like ÅKESSON unless they paste the correct
> >> character into the search boxes (or use the truncation trick), but I
> >> also have to advise the Swedish-speakers that they will not find names
> >> like ÅKESSON or GÖRANSSON where they expect them in the indexes, but in
> >> an "English" version of their own language order!
> >>
> >> Is it possible to set the sort order?
> >>
> >> Is sort order a consequence of o/s locale setting (English Australia in
> >> my case, and which for a host of reasons I will be extremely wary to
> >> change)? If a DWR is generated on a machine of one locale, does index
> >> sort-order change if the report is used on a machine set to a different
> >> locale?
> >>
> >> Is it possible to tweak searching so it will find e. g. LAINÉ if the
> >> user just enters LAINE? Or return JØRGENSEN when they enter JORGENSEN?
> >>
> >> I would be interested to hear the views of people who have more
> >> experience of these issues!
> >>
> >> --
> >> Adrian Davey | GrampsAIO64-5.0.1-1 | W10P
> >>
> >>
> >>
> >> --
> >> Gramps-users mailing list
> >> [hidden email]
> >> https://lists.sourceforge.net/lists/listinfo/gramps-users
> >> https://gramps-project.org
> >>
> >> --
> >> Gramps-users mailing list
> >> [hidden email]
> >> https://lists.sourceforge.net/lists/listinfo/gramps-users
> >> https://gramps-project.org
> >
> >




--
Erik P. Olsen - Copenhagen, Denmark
Fedora 30/64 bit Linux xfce Claws-Mail POP3 Gramps 5.1.1
--
Gramps-users mailing list
[hidden email]
https://lists.sourceforge.net/lists/listinfo/gramps-users
https://gramps-project.org


--
Gramps-users mailing list
[hidden email]
https://lists.sourceforge.net/lists/listinfo/gramps-users
https://gramps-project.org
Reply | Threaded
Open this post in threaded view
|

Re: Sort order of non-latin characters in Gramps

Patrice Legoux

Le lun. 2 déc. 2019 à 14:20, StoltHD <[hidden email]> a écrit :
The problem is that in i.e. Swedish and German the character "Ö=ö" is sorted as larger than Z in Swedish and lesser than in German... This is just one example... this document at unicode.org, describe mostly everything about the standard...

https://unicode.org/reports/tr10/

man. 2. des. 2019 kl. 13:18 skrev Emyoulation--- via Gramps-users <[hidden email]>:

There was an interesting discussion about option regarding the contrariness of how different languages sort Unicode here:
https://stackoverflow.com/questions/16538605/utf8-collation-difference-between-unicode-and-danish

One point was about some languages even having MULTIPLE sort rules... depending on the application. (Their example was where German dictionaries & phone directories sort the umlaut accenting oppositely.)

But equally interesting is a discussion mentioning SQLite's approach to Unicode that provides a BYOS (Bring Your Own Sorter) policy.
See:

Since Gramps is Open Source, maybe there's an opportunity for someone to add a dynamic sorting policy options to the Internationalization features of our favorite program.

-Brian

On Mon, Dec 2, 2019 at 5:28, Erik P. Olsen
Same with Danish.

--
Erik


On 2019-12-02 at 11:26:40 StoltHD wrote:

>  This apply to most Unicode doublebyte characters, Norwegian has the same
> problem...
>
> But it sort correctly when using Norwegian Language as computer input
> language and Norwegian in Gramps... But If I switch to English on a
> Computer with Norwegian Windows, it sort the wrong way...
>
> So this is not a OS/Windows problem, it is a Gramps/Python Problem when
> English is used as Language in Gramps...
>
> Jaran
>
> man. 2. des. 2019 kl. 10:58 skrev StoltHD <[hidden email]>:
>
> > This apply to most Unicode doublebyte characters, Norwegian has the same
> > problem...
> >
> > man. 2. des. 2019 kl. 04:48 skrev Emyoulation--- via Gramps-users <
> > [hidden email]>:
> >
> >> Adrian,
> >>
> >> This is a thorny issue. And Swedish/Danish difficulties have been
> >> specifically been topics of discussion all the way back in the version 3
> >> documentation.
> >>
> >> You probably want to explore the Alternative Name options.  They allow
> >> you to override sorting, grouping & make to make the Latin variant
> >> invisible yet searchable ... all independently of the Display name.
> >>
> >>
> >> https://www.gramps-project.org/wiki/index.php/Gramps_5.0_Wiki_Manual_-_Entering_and_editing_data:_detailed_-_part_3
> >>
> >> There have been recent discussions about the Alternative Name feature
> >> that you would've missed as a new subscriber to this maillist.
> >> A couple of these archived threads are linked in the 'See also' section
> >> at the bottom of the following page:
> >> https://gramps-project.org/wiki/index.php/Grouping_Surnames
> >> (This is a good way to be introduced to exploring the archives too.)
> >>
> >> -Brian
> >>
> >> On Sun, Dec 1, 2019 at 18:48, Adrian Davey
> >> <[hidden email]> wrote:
> >> Is there any way in Gramps to set the sorting of non-latin characters?
> >>
> >> Or, if Gramps effectively treats non-latin characters for sorting as
> >> somehow equivalent to latin ones, can it also be configured to do the
> >> same in a search?
> >>
> >> For example, I think I am correct in saying that the Danish alphabet
> >> includes the 26 latin characters A-Z, plus Æ, Ø and Å, which sort after
> >> Z, and in that sequence. And the Swedish alphabet includes the 26 latin
> >> characters A-Z, plus  Å, Ä, and Ö, which again sort after Z.
> >>
> >> But Gramps sorts names simply as if e.g. Å was an A, such that (for
> >> example), instead of ÅKESSON being listed after Z, it is within the A
> >> names (in my d/b, the successive family names after a name sort are
> >> AITKEN, ÅKERLUND, ÅKERMAN, AKERS, ÅKESDOTTER, ÅKESSON, ALBERD, ALBERG).
> >> Likewise Gramps sorts Ø as if it was O, so JØRGENSEN is not at the end
> >> of the J... list, but within the JOR... listings (JORDAN, JØRGENSEN,
> >> JORY). BÆK will be after BADGER and before BAILEY (rather than following
> >> BYWORTH and preceding CABLE). And so on.
> >>
> >> This behaviour of treating non-latin characters as the nearest
> >> equivalent latin one would not necessarily be a problem if it was
> >> consistent. Entering the string ÅKESSON with the default "Name contains"
> >> in the Find at top of the people view does find all instances of that
> >> name. But entering AKESSON does not find any. (While I am now
> >> surprisingly proficient at entering non-latin characters in unicode,
> >> more or less with my eyes shut, I also often use the trick of truncating
> >> the string so it looks for e.g. "KESSON", which often provides a quick
> >> workaround.)
> >>
> >> The Gramps support for unicode is extremely good, so I am a bit puzzled
> >> how a unicode OOC5 or 00E5 (the lower case and upper case instances of
> >> Å) appears to be treated the same as 0041/0061 (A/a) for sorting anyway.
> >> There must be some complicated wizardry hidden somewhere to achieve that!
> >>
> >> I imagine there is an extremely complicated can of worms here.
> >>
> >> Gramps is not likely to know which language (and therefore alphabet and
> >> sort-order) a particular unicode character may belong to. It is
> >> presumably the case that the same name containing the character Å
> >> strictly-speaking should be sorted differently according to whether it
> >> is, say, Danish rather than Swedish.
> >>
> >> I am not at all expert in any of the (many) languages that use more than
> >> the 26-character latin alphabet. Over the last year or so I have had a
> >> bit of a baptism by fire in Danish & Swedish. Since my first use of
> >> Gramps I have always needed to use diacritics. because my own ancestry
> >> includes a line of Norman French names, but courtesy of Scandinavia I
> >> now have a substantial population of names that use non-latin characters.
> >>
> >> I also now need to generate reports that meet the expectations of both
> >> the speakers of English and, say, Swedish.
> >>
> >> So now when I distribute a dynamic web report to family members, I have
> >> the complication that I have to explain to the English-speakers that
> >> they will not find a name like ÅKESSON unless they paste the correct
> >> character into the search boxes (or use the truncation trick), but I
> >> also have to advise the Swedish-speakers that they will not find names
> >> like ÅKESSON or GÖRANSSON where they expect them in the indexes, but in
> >> an "English" version of their own language order!
> >>
> >> Is it possible to set the sort order?
> >>
> >> Is sort order a consequence of o/s locale setting (English Australia in
> >> my case, and which for a host of reasons I will be extremely wary to
> >> change)? If a DWR is generated on a machine of one locale, does index
> >> sort-order change if the report is used on a machine set to a different
> >> locale?
> >>
> >> Is it possible to tweak searching so it will find e. g. LAINÉ if the
> >> user just enters LAINE? Or return JØRGENSEN when they enter JORGENSEN?
> >>
> >> I would be interested to hear the views of people who have more
> >> experience of these issues!
> >>
> >> --
> >> Adrian Davey | GrampsAIO64-5.0.1-1 | W10P
> >>
> >>
> >>
> >> --
> >> Gramps-users mailing list
> >> [hidden email]
> >> https://lists.sourceforge.net/lists/listinfo/gramps-users
> >> https://gramps-project.org
> >>
> >> --
> >> Gramps-users mailing list
> >> [hidden email]
> >> https://lists.sourceforge.net/lists/listinfo/gramps-users
> >> https://gramps-project.org
> >
> >




--
Erik P. Olsen - Copenhagen, Denmark
Fedora 30/64 bit Linux xfce Claws-Mail POP3 Gramps 5.1.1
--
Gramps-users mailing list
[hidden email]
https://lists.sourceforge.net/lists/listinfo/gramps-users
https://gramps-project.org
--
Gramps-users mailing list
[hidden email]
https://lists.sourceforge.net/lists/listinfo/gramps-users
https://gramps-project.org


--
Gramps-users mailing list
[hidden email]
https://lists.sourceforge.net/lists/listinfo/gramps-users
https://gramps-project.org
Reply | Threaded
Open this post in threaded view
|

Re: Sort order of non-latin characters in Gra

GRAMPS - User mailing list
In reply to this post by StoltHD
Regarding the 2nd part of Adrian's inquiry...
Does Gramps have any automagic diacritics-ignoring functionality for Searching?

e.g. is there a way to make the 'Find' return all instances of ÅKESSON when searching AKESSON? (Other than adding an anglicized Alternate Name to each Person with ÅKESSON as the Preferred Name.)

And I seem to recall reading something in the Wiki -- that the Search Bar will not consider the Alternate Names while the Filter Gramplets does. Is that still the case?

-Brian


--
Gramps-users mailing list
[hidden email]
https://lists.sourceforge.net/lists/listinfo/gramps-users
https://gramps-project.org
Reply | Threaded
Open this post in threaded view
|

Re: Sort order of non-latin characters in Gra

Dave Scheipers
> And I seem to recall reading something in the Wiki -- that the Search Bar will not consider the Alternate Names while the Filter Gramplets does. Is that still the case?
>
> -Brian

I just did a quick check.  All of my males with the suffix Sr are
entered as an alternative name.

The Sidebar Filter gramplet found the records. The Search bar did not,
It only found preferred names with the "Sr"' within the name. Lots of
people with the name "Israel"

Dave using 5.1.1


--
Gramps-users mailing list
[hidden email]
https://lists.sourceforge.net/lists/listinfo/gramps-users
https://gramps-project.org
Reply | Threaded
Open this post in threaded view
|

Re: Sort order of non-latin characters in Gra

Dave Scheipers
After I posted, I think I understand the Search bar filter.  The bar
will only search the displayed columns. The Preferred names are
displayed, not the alternative names. Add and remove columns alters
the Search bar options.

The Sidebar filter and of course filter rules are searching the full record.

So as I see it, there would be no way to have the search bar dig
deeper into the full records.

Dave

On Mon, Dec 2, 2019 at 10:00 AM Dave Scheipers <[hidden email]> wrote:

>
> > And I seem to recall reading something in the Wiki -- that the Search Bar will not consider the Alternate Names while the Filter Gramplets does. Is that still the case?
> >
> > -Brian
>
> I just did a quick check.  All of my males with the suffix Sr are
> entered as an alternative name.
>
> The Sidebar Filter gramplet found the records. The Search bar did not,
> It only found preferred names with the "Sr"' within the name. Lots of
> people with the name "Israel"
>
> Dave using 5.1.1


--
Gramps-users mailing list
[hidden email]
https://lists.sourceforge.net/lists/listinfo/gramps-users
https://gramps-project.org
Reply | Threaded
Open this post in threaded view
|

Re: Sort order of non-latin characters in Gra

StoltHD
There is a SoundEX gramplet, but I have never thought about it...

But in the filter GRamplet I can't even type the character "ö" and "ä" ... But when I search "Go" I do not get "Göransson" as part of the result...
Does not matter if I use English or Norwegian Language for Gramps.

And its the same for the Searchbar...





man. 2. des. 2019 kl. 16:15 skrev Dave Scheipers <[hidden email]>:
After I posted, I think I understand the Search bar filter.  The bar
will only search the displayed columns. The Preferred names are
displayed, not the alternative names. Add and remove columns alters
the Search bar options.

The Sidebar filter and of course filter rules are searching the full record.

So as I see it, there would be no way to have the search bar dig
deeper into the full records.

Dave

On Mon, Dec 2, 2019 at 10:00 AM Dave Scheipers <[hidden email]> wrote:
>
> > And I seem to recall reading something in the Wiki -- that the Search Bar will not consider the Alternate Names while the Filter Gramplets does. Is that still the case?
> >
> > -Brian
>
> I just did a quick check.  All of my males with the suffix Sr are
> entered as an alternative name.
>
> The Sidebar Filter gramplet found the records. The Search bar did not,
> It only found preferred names with the "Sr"' within the name. Lots of
> people with the name "Israel"
>
> Dave using 5.1.1


--
Gramps-users mailing list
[hidden email]
https://lists.sourceforge.net/lists/listinfo/gramps-users
https://gramps-project.org


--
Gramps-users mailing list
[hidden email]
https://lists.sourceforge.net/lists/listinfo/gramps-users
https://gramps-project.org
Reply | Threaded
Open this post in threaded view
|

Re: Sort order of non-latin characters in Gra

GRAMPS - User mailing list
In reply to this post by Dave Scheipers
Interesting.  The Search Bar portion of the Wiki still needs some tweaking. This discussion is good fodder for that.

As is:

-Brian

On Mon, Dec 2, 2019 at 9:10, Dave Scheipers
After I posted, I think I understand the Search bar filter.  The bar
will only search the displayed columns. The Preferred names are
displayed, not the alternative names. Add and remove columns alters
the Search bar options.

The Sidebar filter and of course filter rules are searching the full record.

So as I see it, there would be no way to have the search bar dig
deeper into the full records.

Dave

On Mon, Dec 2, 2019 at 10:00 AM Dave Scheipers <[hidden email]> wrote:

>
> > And I seem to recall reading something in the Wiki -- that the Search Bar will not consider the Alternate Names while the Filter Gramplets does. Is that still the case?
> >
> > -Brian
>
> I just did a quick check.  All of my males with the suffix Sr are
> entered as an alternative name.
>
> The Sidebar Filter gramplet found the records. The Search bar did not,
> It only found preferred names with the "Sr"' within the name. Lots of
> people with the name "Israel"
>
> Dave using 5.1.1


--
Gramps-users mailing list
[hidden email]
https://lists.sourceforge.net/lists/listinfo/gramps-users
https://gramps-project.org
Reply | Threaded
Open this post in threaded view
|

Re: Sort order of non-latin characters in Gra

GRAMPS - User mailing list
In reply to this post by StoltHD
On Mon, Dec 2, 2019 at 10:03, StoltHD
There is a SoundEX gramplet, but I have never thought about it...
-- 

When I first started Gramps, I had been hoping for usable phonetic (like SoundEx, Daitch–Mokotoff Soundex, NYSIIS, Metaphone, Double Metaphone, Metaphone3, et cetera) fuzzy search. (It seemed like that would help with my McCullough surname spelling variant problem.)

Then when I saw Alternate Names, I had been hoping for a utility that would add a searchable SoundEx attribute to every Name in the Tree. 

Recalculating SoundEx for an entire database can be CPU intensive. So a 'sounds like' search performance improves by caching SoundEx cookies. But SoundEx cookies occasionally require manual tuning.  So GUI access to the cookie is good. 

But, so far, I've only found 2 Soundex features: 

1) an option when searching duplicate Persons for merging

2) the SoundEx gramplet and that seems to be limited to being a SoundEx calculator. 

Since raw SoundEx codes are not recognizable by mere humans, a good Fuzzy Search would be like the recent lat/long parser enhancement: you could type in a Human readable but parsable term and the calculated form would appear in an adjacent (and tweakable!) field.

(Why tweak? Sometimes silent letters screw with the evaluation. Or, say that you wanted to have both the prefix & non-prefix version of 'Van Deusen' to be searchable. So you could encode as 'Deusen VanDeusen'  resulting in 'D-250 V-532' and the search could find both.)

-Brian


--
Gramps-users mailing list
[hidden email]
https://lists.sourceforge.net/lists/listinfo/gramps-users
https://gramps-project.org
Reply | Threaded
Open this post in threaded view
|

Re: Sort order of non-latin characters in Gramps

GRAMPS - User mailing list
In reply to this post by adrian.davey
The sort order in the UI is determined by your locale.  If available,
the pyICU module is used for sorting.  It is strongly recommended that
you install this if your data includes non-latin characters.  Reports,
for example the complete individual report, should use a locale
consistent with the language setting.  The locale used for sorting
cannot be set independently from the language.

Searching should return an exact unicode match.  "ÅKESSON" is not the
same as "AKESSON".  At present, there is no fuzzy matching functionality.

Nick.


On 02/12/2019 00:46, Adrian Davey wrote:

> Is there any way in Gramps to set the sorting of non-latin characters?
>
> Or, if Gramps effectively treats non-latin characters for sorting as
> somehow equivalent to latin ones, can it also be configured to do the
> same in a search?
>
> For example, I think I am correct in saying that the Danish alphabet
> includes the 26 latin characters A-Z, plus Æ, Ø and Å, which sort
> after Z, and in that sequence. And the Swedish alphabet includes the
> 26 latin characters A-Z, plus  Å, Ä, and Ö, which again sort after Z.
>
> But Gramps sorts names simply as if e.g. Å was an A, such that (for
> example), instead of ÅKESSON being listed after Z, it is within the A
> names (in my d/b, the successive family names after a name sort are
> AITKEN, ÅKERLUND, ÅKERMAN, AKERS, ÅKESDOTTER, ÅKESSON, ALBERD,
> ALBERG). Likewise Gramps sorts Ø as if it was O, so JØRGENSEN is not
> at the end of the J... list, but within the JOR... listings (JORDAN,
> JØRGENSEN, JORY). BÆK will be after BADGER and before BAILEY (rather
> than following BYWORTH and preceding CABLE). And so on.
>
> This behaviour of treating non-latin characters as the nearest
> equivalent latin one would not necessarily be a problem if it was
> consistent. Entering the string ÅKESSON with the default "Name
> contains" in the Find at top of the people view does find all
> instances of that name. But entering AKESSON does not find any. (While
> I am now surprisingly proficient at entering non-latin characters in
> unicode, more or less with my eyes shut, I also often use the trick of
> truncating the string so it looks for e.g. "KESSON", which often
> provides a quick workaround.)
>
> The Gramps support for unicode is extremely good, so I am a bit
> puzzled how a unicode OOC5 or 00E5 (the lower case and upper case
> instances of Å) appears to be treated the same as 0041/0061 (A/a) for
> sorting anyway. There must be some complicated wizardry hidden
> somewhere to achieve that!
>
> I imagine there is an extremely complicated can of worms here.
>
> Gramps is not likely to know which language (and therefore alphabet
> and sort-order) a particular unicode character may belong to. It is
> presumably the case that the same name containing the character Å
> strictly-speaking should be sorted differently according to whether it
> is, say, Danish rather than Swedish.
>
> I am not at all expert in any of the (many) languages that use more
> than the 26-character latin alphabet. Over the last year or so I have
> had a bit of a baptism by fire in Danish & Swedish. Since my first use
> of Gramps I have always needed to use diacritics. because my own
> ancestry includes a line of Norman French names, but courtesy of
> Scandinavia I now have a substantial population of names that use
> non-latin characters.
>
> I also now need to generate reports that meet the expectations of both
> the speakers of English and, say, Swedish.
>
> So now when I distribute a dynamic web report to family members, I
> have the complication that I have to explain to the English-speakers
> that they will not find a name like ÅKESSON unless they paste the
> correct character into the search boxes (or use the truncation trick),
> but I also have to advise the Swedish-speakers that they will not find
> names like ÅKESSON or GÖRANSSON where they expect them in the indexes,
> but in an "English" version of their own language order!
>
> Is it possible to set the sort order?
>
> Is sort order a consequence of o/s locale setting (English Australia
> in my case, and which for a host of reasons I will be extremely wary
> to change)? If a DWR is generated on a machine of one locale, does
> index sort-order change if the report is used on a machine set to a
> different locale?
>
> Is it possible to tweak searching so it will find e. g. LAINÉ if the
> user just enters LAINE? Or return JØRGENSEN when they enter JORGENSEN?
>
> I would be interested to hear the views of people who have more
> experience of these issues!
>



--
Gramps-users mailing list
[hidden email]
https://lists.sourceforge.net/lists/listinfo/gramps-users
https://gramps-project.org
Reply | Threaded
Open this post in threaded view
|

Re: Sort order of non-latin characters in Gramps

GRAMPS - User mailing list
In reply to this post by GRAMPS - User mailing list
On 02/12/2019 12:14, Emyoulation--- via Gramps-users wrote:
But equally interesting is a discussion mentioning SQLite's approach to Unicode that provides a BYOS (Bring Your Own Sorter) policy.
See:

We already create custom collations using the collation order from the GrampsLocale class.  This ensures that database queries use the same sort order as other parts of Gramps.


Nick.




--
Gramps-users mailing list
[hidden email]
https://lists.sourceforge.net/lists/listinfo/gramps-users
https://gramps-project.org
Reply | Threaded
Open this post in threaded view
|

Re: Sort order of non-latin characters in Gra

GRAMPS - User mailing list
In reply to this post by GRAMPS - User mailing list
On 02/12/2019 14:28, Emyoulation--- via Gramps-users wrote:
Does Gramps have any automagic diacritics-ignoring functionality for Searching?


Not that I am aware of.



And I seem to recall reading something in the Wiki -- that the Search Bar will not consider the Alternate Names while the Filter Gramplets does. Is that still the case?


Yes.


Nick.




--
Gramps-users mailing list
[hidden email]
https://lists.sourceforge.net/lists/listinfo/gramps-users
https://gramps-project.org
Reply | Threaded
Open this post in threaded view
|

Re: Sort order of non-latin characters in Gra

GRAMPS - User mailing list
In reply to this post by Dave Scheipers
Correct.

On 02/12/2019 15:10, Dave Scheipers wrote:

> After I posted, I think I understand the Search bar filter.  The bar
> will only search the displayed columns. The Preferred names are
> displayed, not the alternative names. Add and remove columns alters
> the Search bar options.
>
> The Sidebar filter and of course filter rules are searching the full record.
>
> So as I see it, there would be no way to have the search bar dig
> deeper into the full records.
>
> Dave
>
> On Mon, Dec 2, 2019 at 10:00 AM Dave Scheipers <[hidden email]> wrote:
>>> And I seem to recall reading something in the Wiki -- that the Search Bar will not consider the Alternate Names while the Filter Gramplets does. Is that still the case?
>>>
>>> -Brian
>> I just did a quick check.  All of my males with the suffix Sr are
>> entered as an alternative name.
>>
>> The Sidebar Filter gramplet found the records. The Search bar did not,
>> It only found preferred names with the "Sr"' within the name. Lots of
>> people with the name "Israel"
>>
>> Dave using 5.1.1
>



--
Gramps-users mailing list
[hidden email]
https://lists.sourceforge.net/lists/listinfo/gramps-users
https://gramps-project.org
Reply | Threaded
Open this post in threaded view
|

Re: Sort order of non-latin characters in Gra

GRAMPS - User mailing list
In reply to this post by StoltHD
On 02/12/2019 16:00, StoltHD wrote:
> There is a SoundEX gramplet, but I have never thought about it...

See:

1768: Improve soundex tool.  Integrate better with Gramps.

https://gramps-project.org/bugs/view.php?id=1768


>
> But in the filter GRamplet I can't even type the character "ö" and "ä"
> ... But when I search "Go" I do not get "Göransson" as part of the
> result...
> Does not matter if I use English or Norwegian Language for Gramps.


What OS are you using?

Have you configured the compose key?


Nick.




--
Gramps-users mailing list
[hidden email]
https://lists.sourceforge.net/lists/listinfo/gramps-users
https://gramps-project.org
Reply | Threaded
Open this post in threaded view
|

Re: Sort order of non-latin characters in Gra

StoltHD
I use Windows 10, and no, I have not configured any compose key?  I dont even know what that are or where to find it
This is a "plain" install 5.1.1 with the mongodb module enabled (Works fine, only the "first read" that is a little delayed)...

Jaran

tir. 3. des. 2019 kl. 11:56 skrev Nick Hall via Gramps-users <[hidden email]>:
On 02/12/2019 16:00, StoltHD wrote:
> There is a SoundEX gramplet, but I have never thought about it...

See:

1768: Improve soundex tool.  Integrate better with Gramps.

https://gramps-project.org/bugs/view.php?id=1768


>
> But in the filter GRamplet I can't even type the character "ö" and "ä"
> ... But when I search "Go" I do not get "Göransson" as part of the
> result...
> Does not matter if I use English or Norwegian Language for Gramps.


What OS are you using?

Have you configured the compose key?


Nick.




--
Gramps-users mailing list
[hidden email]
https://lists.sourceforge.net/lists/listinfo/gramps-users
https://gramps-project.org


--
Gramps-users mailing list
[hidden email]
https://lists.sourceforge.net/lists/listinfo/gramps-users
https://gramps-project.org
12