Quantcast

Sort mystery

classic Classic list List threaded Threaded
28 messages Options
12
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

Sort mystery

Peter Landgren

Hi,

Need some hint to understand sorting i python.

Example see:

http://www.gramps-project.org/bugs/view.php?id=2933

As can be seen surnames beginning with V and W are sorted "together!", not as separate.

If I change the sort method from xxx.sort(locale.strcoll) to just xxx.sort() I get V and W sorted separate, but of course names beginning with non ascii letters fail.

Why is V and W treated differently in the two sort methods?

I have seen this problem not only in Narr Web Rep, but also in the place view.

Note I'm using Swedish locale. How does it work with other locales?

/Peter


------------------------------------------------------------------------------
Stay on top of everything new and different, both inside and
around Java (TM) technology - register by April 22, and save
$200 on the JavaOne (SM) conference, June 2-5, 2009, San Francisco.
300 plus technical and hands-on sessions. Register today.
Use priority code J9JMT32. http://p.sf.net/sfu/p
_______________________________________________
Gramps-devel mailing list
[hidden email]
https://lists.sourceforge.net/lists/listinfo/gramps-devel
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

Re: Sort mystery

Peter Landgren

Den Saturday 18 April 2009 16.30.00 skrev Peter Landgren:

> Hi,

>

> Need some hint to understand sorting i python.

> Example see:

> http://www.gramps-project.org/bugs/view.php?id=2933

>

> As can be seen surnames beginning with V and W are sorted "together!", not

> as separate. If I change the sort method from xxx.sort(locale.strcoll) to

> just xxx.sort() I get V and W sorted separate, but of course names

> beginning with non ascii letters fail.

>

> Why is V and W treated differently in the two sort methods?

>

> I have seen this problem not only in Narr Web Rep, but also in the place

> view.

>

> Note I'm using Swedish locale. How does it work with other locales?

>

> /Peter

I tested with my system set to English locale and then V and W are sorted as separate letters. So, V and W are treated as the same letter when non English locale?

/Peter


------------------------------------------------------------------------------
Stay on top of everything new and different, both inside and
around Java (TM) technology - register by April 22, and save
$200 on the JavaOne (SM) conference, June 2-5, 2009, San Francisco.
300 plus technical and hands-on sessions. Register today.
Use priority code J9JMT32. http://p.sf.net/sfu/p
_______________________________________________
Gramps-devel mailing list
[hidden email]
https://lists.sourceforge.net/lists/listinfo/gramps-devel
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

Re: Sort mystery

Serge Noiraud-2
In reply to this post by Peter Landgren
Peter Landgren a écrit :

> Hi,
>
> Need some hint to understand sorting i python.
>
> Example see:
>
> http://www.gramps-project.org/bugs/view.php?id=2933
>
> As can be seen surnames beginning with V and W are sorted "together!",
> not as separate.
>
> If I change the sort method from xxx.sort(locale.strcoll) to just
> xxx.sort() I get V and W sorted separate, but of course names beginning
> with non ascii letters fail.
I think the difference is the following :

The sort function without argument use the C locale for sorting.
In this case, the C collating sequence is the ASCII sequence, so
the "V" and "W" characters are differents.

When you use the argument, you use your locale ( LC_* and LANG
environment variables )

At the prompt, the locale command gives you your environment for that.
In this case, depending on that, you can have "V" and "W" the same for
sorting.

This is not python the problem, but the collating sequence depending on
your locale. In your case, the "V" and "W" are similar for your locale.

>
> Why is V and W treated differently in the two sort methods?
>
> I have seen this problem not only in Narr Web Rep, but also in the place
> view.
>
> Note I'm using Swedish locale. How does it work with other locales?
>
> /Peter

------------------------------------------------------------------------------
Stay on top of everything new and different, both inside and
around Java (TM) technology - register by April 22, and save
$200 on the JavaOne (SM) conference, June 2-5, 2009, San Francisco.
300 plus technical and hands-on sessions. Register today.
Use priority code J9JMT32. http://p.sf.net/sfu/p
_______________________________________________
Gramps-devel mailing list
[hidden email]
https://lists.sourceforge.net/lists/listinfo/gramps-devel
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

Re: Sort mystery

Peter Landgren

Yes,

You are probably right. In Swedish W (and Q) is only used in names too make them look better than with a V (or K). The pronunciation is the same for V and W.

There are no "normal" words in Swedish beginning with W (or Q). Except maybe some foreign "borrowed" word. However the result of this in the Narrated Web Report does not look very appealing!

Any idea how to collect V and W separately just for this purpose?

/Peter

Den Saturday 18 April 2009 17.32.33 skrev Serge Noiraud:

> Peter Landgren a écrit :

> > Hi,

> >

> > Need some hint to understand sorting i python.

> >

> > Example see:

> >

> > http://www.gramps-project.org/bugs/view.php?id=2933

> >

> > As can be seen surnames beginning with V and W are sorted "together!",

> > not as separate.

> >

> > If I change the sort method from xxx.sort(locale.strcoll) to just

> > xxx.sort() I get V and W sorted separate, but of course names beginning

> > with non ascii letters fail.

>

> I think the difference is the following :

>

> The sort function without argument use the C locale for sorting.

> In this case, the C collating sequence is the ASCII sequence, so

> the "V" and "W" characters are differents.

>

> When you use the argument, you use your locale ( LC_* and LANG

> environment variables )

>

> At the prompt, the locale command gives you your environment for that.

> In this case, depending on that, you can have "V" and "W" the same for

> sorting.

>

> This is not python the problem, but the collating sequence depending on

> your locale. In your case, the "V" and "W" are similar for your locale.

>

> > Why is V and W treated differently in the two sort methods?

> >

> > I have seen this problem not only in Narr Web Rep, but also in the place

> > view.

> >

> > Note I'm using Swedish locale. How does it work with other locales?

> >

> > /Peter


------------------------------------------------------------------------------
Stay on top of everything new and different, both inside and
around Java (TM) technology - register by April 22, and save
$200 on the JavaOne (SM) conference, June 2-5, 2009, San Francisco.
300 plus technical and hands-on sessions. Register today.
Use priority code J9JMT32. http://p.sf.net/sfu/p
_______________________________________________
Gramps-devel mailing list
[hidden email]
https://lists.sourceforge.net/lists/listinfo/gramps-devel
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

Re: Sort mystery

Serge Noiraud-2
Peter Landgren a écrit :

> Yes,
>
> You are probably right. In Swedish W (and Q) is only used in names too
> make them look better than with a V (or K). The pronunciation is the
> same for V and W.
>
> There are no "normal" words in Swedish beginning with W (or Q). Except
> maybe some foreign "borrowed" word. However the result of this in the
> Narrated Web Report does not look very appealing!
>
> Any idea how to collect V and W separately just for this purpose?
I don't know if it is possible to select a locale for a report.
I think it's a feature which is not implemented.

The only way to create a report should be to launch gramps in command
line specifying the locale. The page below explains this :

http://gramps-project.org/wiki/index.php?title=Gramps_3.0_Wiki_Manual_-_Command_Line

If you select the example at the end of this page to create a report :
gramps -O file.grdb -a report -p name=timeline,off=pdf,of=my_timeline.pdf

If you want to create the same report with a specific locale, do :
LC_COLLATE=C gramps -O file.grdb -a report -p name=timeline,off=pdf,of=my_timeline.pdf

LC_COLLATE is the environment variable for sorting.
In this case I selected C which means ASCII.
Be carefull, you could have unpredictable result with utf-8 databases.

Hope it helps

Serge

> /Peter
>
> Den Saturday 18 April 2009 17.32.33 skrev Serge Noiraud:
>
> > Peter Landgren a écrit :
>
> > > Hi,
> > >
> > > Need some hint to understand sorting i python.
> > >
> > > Example see:
> > >
> > > http://www.gramps-project.org/bugs/view.php?id=2933
> > >
> > > As can be seen surnames beginning with V and W are sorted "together!",
> > > not as separate.
> > >
> > > If I change the sort method from xxx.sort(locale.strcoll) to just
> > > xxx.sort() I get V and W sorted separate, but of course names beginning
> > > with non ascii letters fail.
> >
> > I think the difference is the following :
> >
> > The sort function without argument use the C locale for sorting.
> > In this case, the C collating sequence is the ASCII sequence, so
> > the "V" and "W" characters are differents.
> >
> > When you use the argument, you use your locale ( LC_* and LANG
> > environment variables )
> >
> > At the prompt, the locale command gives you your environment for that.
> > In this case, depending on that, you can have "V" and "W" the same for
> > sorting.
> >
> > This is not python the problem, but the collating sequence depending on
> > your locale. In your case, the "V" and "W" are similar for your locale.
> >
> > > Why is V and W treated differently in the two sort methods?
> > >
> > > I have seen this problem not only in Narr Web Rep, but also in the place
> > > view.
> > >
> > > Note I'm using Swedish locale. How does it work with other locales?
> > >
> > > /Peter



------------------------------------------------------------------------------
Stay on top of everything new and different, both inside and
around Java (TM) technology - register by April 22, and save
$200 on the JavaOne (SM) conference, June 2-5, 2009, San Francisco.
300 plus technical and hands-on sessions. Register today.
Use priority code J9JMT32. http://p.sf.net/sfu/p
_______________________________________________
Gramps-devel mailing list
[hidden email]
https://lists.sourceforge.net/lists/listinfo/gramps-devel
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

Re: Sort mystery

Peter Landgren

Serge,

Thanks for all info. However, there are no problems with other reports than Narr. Web Report, where the alphabetic order using the first letter in the surnames or places is used.

This does not look nice; ṫhere should be one entry for V and one for W.

W

Wacker

1

 

Wagner

1

 

Wahlén

2

V

Vahlström

1

W

Wahlström

17

 

Waling

2

V

Vallenius

1

W

Wallenstrand

3

V

Vallgren

8

W

Wallgren

15

V

Vallin

5

W

Wallin

2

 

Wallon

2

 

Watz

1

 

Wentzell

1

 

Westberg

1

 

Westerberg

1

 

Westerdahl

13

 

Westerlund

2

 

Westin

1

 

Westman

13

 

Westmark

1

 

Wettersten

1

 

Whippa-Danielsson

1

 

Wickberg

1

 

Widegren

1

 

Widén

3

V

Viderstedt

14

W

Widgren

4

 

Wiggman

1

V

Vigsten

3

 

Vikberg

1

 

Viklund

5

 

Viksten

1

 

Vikström

2

 

Vilhelmsson Burén

2

 

Viman

1

 

Vinberg

7

 

Vinkvist

3

W

Winqvist

2

 

Wirström

5

 

Wistrand

2

 

Witt

1

V

von Schinkel

3

V

von Vicken

1

W

Woolford

1

V

Vätz

1

(Copied from the web page, when no style was used.)

/Peter

> Peter Landgren a écrit :

> > Yes,

> >

> > You are probably right. In Swedish W (and Q) is only used in names too

> > make them look better than with a V (or K). The pronunciation is the

> > same for V and W.

> >

> > There are no "normal" words in Swedish beginning with W (or Q). Except

> > maybe some foreign "borrowed" word. However the result of this in the

> > Narrated Web Report does not look very appealing!

> >

> > Any idea how to collect V and W separately just for this purpose?

>

> I don't know if it is possible to select a locale for a report.

> I think it's a feature which is not implemented.

>

> The only way to create a report should be to launch gramps in command

> line specifying the locale. The page below explains this :

>

> http://gramps-project.org/wiki/index.php?title=Gramps_3.0_Wiki_Manual_-_Com

>mand_Line

>

> If you select the example at the end of this page to create a report :

> gramps -O file.grdb -a report -p name=timeline,off=pdf,of=my_timeline.pdf

>

> If you want to create the same report with a specific locale, do :

> LC_COLLATE=C gramps -O file.grdb -a report -p

> name=timeline,off=pdf,of=my_timeline.pdf

>

> LC_COLLATE is the environment variable for sorting.

> In this case I selected C which means ASCII.

> Be carefull, you could have unpredictable result with utf-8 databases.

>

> Hope it helps

>

> Serge

>

> > /Peter

> >

> > Den Saturday 18 April 2009 17.32.33 skrev Serge Noiraud:

> > > Peter Landgren a écrit :

> > > > Hi,

> > > >

> > > > Need some hint to understand sorting i python.

> > > >

> > > > Example see:

> > > >

> > > > http://www.gramps-project.org/bugs/view.php?id=2933

> > > >

> > > > As can be seen surnames beginning with V and W are sorted

> > > > "together!", not as separate.

> > > >

> > > > If I change the sort method from xxx.sort(locale.strcoll) to just

> > > > xxx.sort() I get V and W sorted separate, but of course names

> > > > beginning with non ascii letters fail.

> > >

> > > I think the difference is the following :

> > >

> > > The sort function without argument use the C locale for sorting.

> > > In this case, the C collating sequence is the ASCII sequence, so

> > > the "V" and "W" characters are differents.

> > >

> > > When you use the argument, you use your locale ( LC_* and LANG

> > > environment variables )

> > >

> > > At the prompt, the locale command gives you your environment for that.

> > > In this case, depending on that, you can have "V" and "W" the same for

> > > sorting.

> > >

> > > This is not python the problem, but the collating sequence depending on

> > > your locale. In your case, the "V" and "W" are similar for your locale.

> > >

> > > > Why is V and W treated differently in the two sort methods?

> > > >

> > > > I have seen this problem not only in Narr Web Rep, but also in the

> > > > place view.

> > > >

> > > > Note I'm using Swedish locale. How does it work with other locales?

> > > >

> > > > /Peter

--

Peter Landgren

Talken Hagen

671 94 BRUNSKOG

0570-530 21

070-635 4719

[hidden email]

Skype: pgl4820.2


------------------------------------------------------------------------------
Stay on top of everything new and different, both inside and
around Java (TM) technology - register by April 22, and save
$200 on the JavaOne (SM) conference, June 2-5, 2009, San Francisco.
300 plus technical and hands-on sessions. Register today.
Use priority code J9JMT32. http://p.sf.net/sfu/p
_______________________________________________
Gramps-devel mailing list
[hidden email]
https://lists.sourceforge.net/lists/listinfo/gramps-devel
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

Re: Sort mystery

Peter Landgren
In reply to this post by Serge Noiraud-2

Serge,

Thanks for all info. However, there are no problems with other reports than Narr. Web Report, where the alphabetic order using the first letter in the surnames or places is used.

This does not look nice; ṫhere should be one entry for V and one for W.

W

Wacker

1

 

Wagner

1

 

Wahlén

2

V

Vahlström

1

W

Wahlström

17

 

Waling

2

V

Vallenius

1

W

Wallenstrand

3

V

Vallgren

8

W

Wallgren

15

V

Vallin

5

W

Wallin

2

The previous mail was too big for the list.

/Peter

Den Saturday 18 April 2009 20.13.00 skrev Serge Noiraud:

> Peter Landgren a écrit :

> > Yes,

> >

> > You are probably right. In Swedish W (and Q) is only used in names too

> > make them look better than with a V (or K). The pronunciation is the

> > same for V and W.

> >

> > There are no "normal" words in Swedish beginning with W (or Q). Except

> > maybe some foreign "borrowed" word. However the result of this in the

> > Narrated Web Report does not look very appealing!

> >

> > Any idea how to collect V and W separately just for this purpose?

>

> I don't know if it is possible to select a locale for a report.

> I think it's a feature which is not implemented.

>

> The only way to create a report should be to launch gramps in command

> line specifying the locale. The page below explains this :

>

> http://gramps-project.org/wiki/index.php?title=Gramps_3.0_Wiki_Manual_-_Com

>mand_Line

>

> If you select the example at the end of this page to create a report :

> gramps -O file.grdb -a report -p name=timeline,off=pdf,of=my_timeline.pdf

>

> If you want to create the same report with a specific locale, do :

> LC_COLLATE=C gramps -O file.grdb -a report -p

> name=timeline,off=pdf,of=my_timeline.pdf

>

> LC_COLLATE is the environment variable for sorting.

> In this case I selected C which means ASCII.

> Be carefull, you could have unpredictable result with utf-8 databases.

>

> Hope it helps

>

> Serge

>

> > /Peter

> >

> > Den Saturday 18 April 2009 17.32.33 skrev Serge Noiraud:

> > > Peter Landgren a écrit :

> > > > Hi,

> > > >

> > > > Need some hint to understand sorting i python.

> > > >

> > > > Example see:

> > > >

> > > > http://www.gramps-project.org/bugs/view.php?id=2933

> > > >

> > > > As can be seen surnames beginning with V and W are sorted

> > > > "together!", not as separate.

> > > >

> > > > If I change the sort method from xxx.sort(locale.strcoll) to just

> > > > xxx.sort() I get V and W sorted separate, but of course names

> > > > beginning with non ascii letters fail.

> > >

> > > I think the difference is the following :

> > >

> > > The sort function without argument use the C locale for sorting.

> > > In this case, the C collating sequence is the ASCII sequence, so

> > > the "V" and "W" characters are differents.

> > >

> > > When you use the argument, you use your locale ( LC_* and LANG

> > > environment variables )

> > >

> > > At the prompt, the locale command gives you your environment for that.

> > > In this case, depending on that, you can have "V" and "W" the same for

> > > sorting.

> > >

> > > This is not python the problem, but the collating sequence depending on

> > > your locale. In your case, the "V" and "W" are similar for your locale.

> > >

> > > > Why is V and W treated differently in the two sort methods?

> > > >

> > > > I have seen this problem not only in Narr Web Rep, but also in the

> > > > place view.

> > > >

> > > > Note I'm using Swedish locale. How does it work with other locales?

> > > >

> > > > /Peter

--

Peter Landgren

Talken Hagen

671 94 BRUNSKOG

0570-530 21

070-635 4719

[hidden email]

Skype: pgl4820.2


------------------------------------------------------------------------------
Stay on top of everything new and different, both inside and
around Java (TM) technology - register by April 22, and save
$200 on the JavaOne (SM) conference, June 2-5, 2009, San Francisco.
300 plus technical and hands-on sessions. Register today.
Use priority code J9JMT32. http://p.sf.net/sfu/p
_______________________________________________
Gramps-devel mailing list
[hidden email]
https://lists.sourceforge.net/lists/listinfo/gramps-devel
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

Re: Sort mystery

Serge Noiraud-2
In reply to this post by Peter Landgren
Hi, Peter,
perhaps I found the problem.

Here is how I found it and it works for me :

Peter Landgren a écrit :
> Serge,
>
> Thanks for all info. However, there are no problems with other reports
> than Narr. Web Report, where the alphabetic order using the first letter
> in the surnames or places is used.
>
> This does not look nice; ṫhere should be one entry for V and one for W.
...
OK
I create a file with your names and installed the swedish locale on my machine.
I sort it with the french language :
#locale
LANG=fr_FR.UTF-8
LC_CTYPE="fr_FR.UTF-8"
LC_NUMERIC="fr_FR.UTF-8"
LC_TIME="fr_FR.UTF-8"
LC_COLLATE="fr_FR.UTF-8"
LC_MONETARY="fr_FR.UTF-8"
LC_MESSAGES="fr_FR.UTF-8"
LC_PAPER="fr_FR.UTF-8"
LC_NAME="fr_FR.UTF-8"
LC_ADDRESS="fr_FR.UTF-8"
LC_TELEPHONE="fr_FR.UTF-8"
LC_MEASUREMENT="fr_FR.UTF-8"
LC_IDENTIFICATION="fr_FR.UTF-8"
LC_ALL=
#sort sort.test
Vahlström
Vallenius
Vallgren
Vallin
Vätz
Viderstedt
Vigsten
Vikberg
Viklund
Viksten
Vikström
Vilhelmsson Burén
Viman
Vinberg
Vinkvist
von Schinkel
von Vicken
Wacker
Wagner
Wahlén
Wahlström
Waling
Wallenstrand
Wallgren
Wallin
Wallon
Watz
Wentzell
Westberg
Westerberg
Westerdahl
Westerlund
Westin
Westman
Westmark
Wettersten
Whippa-Danielsson
Wickberg
Widegren
Widén
Widgren
Wiggman
Winqvist
Wirström
Wistrand
Witt
Woolford
#
It seems correct for me.

Now with the swedish language :
#export LANG=sv_SE.utf8
#locale
LANG=sv_SE.utf8
LC_CTYPE="sv_SE.utf8"
LC_NUMERIC="sv_SE.utf8"
LC_TIME="sv_SE.utf8"
LC_COLLATE="sv_SE.utf8"
LC_MONETARY="sv_SE.utf8"
LC_MESSAGES="sv_SE.utf8"
LC_PAPER="sv_SE.utf8"
LC_NAME="sv_SE.utf8"
LC_ADDRESS="sv_SE.utf8"
LC_TELEPHONE="sv_SE.utf8"
LC_MEASUREMENT="sv_SE.utf8"
LC_IDENTIFICATION="sv_SE.utf8"
LC_ALL=
#sort sort.test
Wacker
Wagner
Wahlén
Vahlström
Wahlström
Waling
Vallenius
Wallenstrand
Vallgren
Wallgren
Vallin
Wallin
Wallon
Watz
Wentzell
Westberg
Westerberg
Westerdahl
Westerlund
Westin
Westman
Westmark
Wettersten
Whippa-Danielsson
Wickberg
Widegren
Widén
Viderstedt
Widgren
Wiggman
Vigsten
Vikberg
Viklund
Viksten
Vikström
Vilhelmsson Burén
Viman
Vinberg
Vinkvist
Winqvist
Wirström
Wistrand
Witt
von Schinkel
von Vicken
Woolford
Vätz
#
I effectively found the same result than the python sort.
Now with the LC_COLLATE=C and the swedish language :
#LC_COLLATE=C sort sort.test
Vahlström
Vallenius
Vallgren
Vallin
Viderstedt
Vigsten
Vikberg
Viklund
Viksten
Vikström
Vilhelmsson Burén
Viman
Vinberg
Vinkvist
Vätz
Wacker
Wagner
Wahlström
Wahlén
Waling
Wallenstrand
Wallgren
Wallin
Wallon
Watz
Wentzell
Westberg
Westerberg
Westerdahl
Westerlund
Westin
Westman
Westmark
Wettersten
Whippa-Danielsson
Wickberg
Widegren
Widgren
Widén
Wiggman
Winqvist
Wirström
Wistrand
Witt
Woolford
von Schinkel
von Vicken
#
It seems correct to me.
You should get this result.
If we want the "v" and "V" are together, we should sort the first letter with uppercase.
I have the same problem in french if I have Noiraud and noiraud, I have two letters "N"
So the ASCII sort is not the solution.


It should be useful to know how it works for other countries :
Chinese, cyrillic Indian alphabet and others ?

There is three bugs in NarrativeWeb.py

 1 : The normalize function use NFC in one place and NFKC in all others.
     either we use NFC or we use NFKC, but don't mix these two modes.
 2 : One sort function does not use the locale.strcoll
 3 : There is a problem with the following statement:
     if letter is not last_letter:

So I have a patch you can test. This works for me :

--- NarrativeWeb.py     (révision 12473)
+++ NarrativeWeb.py     (copie de travail)
@@ -1509,7 +1509,7 @@
                 index_val = "%90d_%s" % (999999999-len(data_list), surname)
                 temp_list[index_val] = (surname, data_list)
             temp_keys = temp_list.keys()
-            temp_keys.sort()
+            temp_keys.sort(locale.strcoll)
             person_handle_list = []
             for key in temp_keys:
                 person_handle_list.append(temp_list[key])
@@ -1525,7 +1525,7 @@
             # the surname
             letter = normalize('NFKC', surname)[0].upper()

-            if letter is not last_letter:
+            if letter != last_letter:
                 last_letter = letter
                 of.write('\t\t<tr class="BeginLetter">\n')
                 of.write('\t\t\t<td class="ColumnLetter"><a name="%s">%s</a></td>\n'
@@ -3602,7 +3602,7 @@
             keyname = get_place_keyname(db, handle)

         if keyname:
-            c = normalize('NFC', keyname)[0].upper()
+            c = normalize('NFKC', keyname)[0].upper()
             first_letters.append(c)

     return first_letters

If it works and everybody agree, I'll commit this patch.

> (Copied from the web page, when no style was used.)
>
> /Peter
>
>  > Peter Landgren a écrit :
>
>  > > Yes,
>
>  > >
>
>  > > You are probably right. In Swedish W (and Q) is only used in names too
>
>  > > make them look better than with a V (or K). The pronunciation is the
>
>  > > same for V and W.
>
>  > >
>
>  > > There are no "normal" words in Swedish beginning with W (or Q). Except
>
>  > > maybe some foreign "borrowed" word. However the result of this in the
>
>  > > Narrated Web Report does not look very appealing!
>
>  > >
>
>  > > Any idea how to collect V and W separately just for this purpose?
>
>  >
>
>  > I don't know if it is possible to select a locale for a report.
>
>  > I think it's a feature which is not implemented.
>
>  >
>
>  > The only way to create a report should be to launch gramps in command
>
>  > line specifying the locale. The page below explains this :
>
>  >
>
>  >
> http://gramps-project.org/wiki/index.php?title=Gramps_3.0_Wiki_Manual_-_Com
>
>  >mand_Line
>
>  >
>
>  > If you select the example at the end of this page to create a report :
>
>  > gramps -O file.grdb -a report -p name=timeline,off=pdf,of=my_timeline.pdf
>
>  >
>
>  > If you want to create the same report with a specific locale, do :
>
>  > LC_COLLATE=C gramps -O file.grdb -a report -p
>
>  > name=timeline,off=pdf,of=my_timeline.pdf
>
>  >
>
>  > LC_COLLATE is the environment variable for sorting.
>
>  > In this case I selected C which means ASCII.
>
>  > Be carefull, you could have unpredictable result with utf-8 databases.
>
>  >
>
>  > Hope it helps
>
>  >
>
>  > Serge
>
>  >
>
>  > > /Peter
>
>  > >
>
>  > > Den Saturday 18 April 2009 17.32.33 skrev Serge Noiraud:
>
>  > > > Peter Landgren a écrit :
>
>  > > > > Hi,
>
>  > > > >
>
>  > > > > Need some hint to understand sorting i python.
>
>  > > > >
>
>  > > > > Example see:
>
>  > > > >
>
>  > > > > http://www.gramps-project.org/bugs/view.php?id=2933
>
>  > > > >
>
>  > > > > As can be seen surnames beginning with V and W are sorted
>
>  > > > > "together!", not as separate.
>
>  > > > >
>
>  > > > > If I change the sort method from xxx.sort(locale.strcoll) to just
>
>  > > > > xxx.sort() I get V and W sorted separate, but of course names
>
>  > > > > beginning with non ascii letters fail.
>
>  > > >
>
>  > > > I think the difference is the following :
>
>  > > >
>
>  > > > The sort function without argument use the C locale for sorting.
>
>  > > > In this case, the C collating sequence is the ASCII sequence, so
>
>  > > > the "V" and "W" characters are differents.
>
>  > > >
>
>  > > > When you use the argument, you use your locale ( LC_* and LANG
>
>  > > > environment variables )
>
>  > > >
>
>  > > > At the prompt, the locale command gives you your environment for
> that.
>
>  > > > In this case, depending on that, you can have "V" and "W" the
> same for
>
>  > > > sorting.
>
>  > > >
>
>  > > > This is not python the problem, but the collating sequence
> depending on
>
>  > > > your locale. In your case, the "V" and "W" are similar for your
> locale.
>
>  > > >
>
>  > > > > Why is V and W treated differently in the two sort methods?
>
>  > > > >
>
>  > > > > I have seen this problem not only in Narr Web Rep, but also in the
>
>  > > > > place view.
>
>  > > > >
>
>  > > > > Note I'm using Swedish locale. How does it work with other locales?
>
>  > > > >
>
>  > > > > /Peter
>
> --
>
> Peter Landgren
>
> Talken Hagen
>
> 671 94 BRUNSKOG
>
> 0570-530 21
>
> 070-635 4719
>
> [hidden email]
>
> Skype: pgl4820.2
>




------------------------------------------------------------------------------
Stay on top of everything new and different, both inside and
around Java (TM) technology - register by April 22, and save
$200 on the JavaOne (SM) conference, June 2-5, 2009, San Francisco.
300 plus technical and hands-on sessions. Register today.
Use priority code J9JMT32. http://p.sf.net/sfu/p
_______________________________________________
Gramps-devel mailing list
[hidden email]
https://lists.sourceforge.net/lists/listinfo/gramps-devel
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

Re: Sort mystery

Serge Noiraud-2
Peter Landgren a écrit :

> Serge,
>
> I tested your patch and all was OK. What problem did the patch solve?
>
> I found another bug yesterday:
>
> "The alphabetical navigation bar does not work if there are more than 27
> letters in the alphabet. The second line is never stored in the web page."
>
> Fixed in r 12469 for branch.
> Fixed in r 12471 for trunk.
>
> /Peter
>
> > There is three bugs in NarrativeWeb.py
>
> >
>
> > 1 : The normalize function use NFC in one place and NFKC in all others.
>
> >     either we use NFC or we use NFKC, but don't mix these two modes.
>
> > 2 : One sort function does not use the locale.strcoll
>
> > 3 : There is a problem with the following statement:
>
> >     if letter is not last_letter:

I think the "if" was the problem at line 1528.
Why  ? I don't know.
My problem was between the firt letter of Noiraud and noiraud.
Theorically, with the upper method applied to the first character,
"N".upper() and "n".upper() are the same.
but "if letter is not last_letter:" doesn't work in this case.
I replace "is not" by "!=" and it works !

If someone can explain that !

Serge

------------------------------------------------------------------------------
Stay on top of everything new and different, both inside and
around Java (TM) technology - register by April 22, and save
$200 on the JavaOne (SM) conference, June 2-5, 2009, San Francisco.
300 plus technical and hands-on sessions. Register today.
Use priority code J9JMT32. http://p.sf.net/sfu/p
_______________________________________________
Gramps-devel mailing list
[hidden email]
https://lists.sourceforge.net/lists/listinfo/gramps-devel
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

Re: Sort mystery

Kees Bakker
On Sunday 19 April 2009, Serge Noiraud wrote:

> > > 3 : There is a problem with the following statement:
> >
> > >     if letter is not last_letter:
>
> I think the "if" was the problem at line 1528.
> Why  ? I don't know.
> My problem was between the firt letter of Noiraud and noiraud.
> Theorically, with the upper method applied to the first character,
> "N".upper() and "n".upper() are the same.
> but "if letter is not last_letter:" doesn't work in this case.
> I replace "is not" by "!=" and it works !
>
> If someone can explain that !

This is basic Python. I'll have to lookup the section in the manual. In general
you have to understand that "is not" is really very different from "!=".

There has been a tendency to replace "!=" by "is not" to improve the speed
of Gramps. But that optimization has to be done with great care. Not only
is it sometimes wrong (as shown above), but many times it has no influence
on the speed of Gramps.
--
Kees

------------------------------------------------------------------------------
Stay on top of everything new and different, both inside and
around Java (TM) technology - register by April 22, and save
$200 on the JavaOne (SM) conference, June 2-5, 2009, San Francisco.
300 plus technical and hands-on sessions. Register today.
Use priority code J9JMT32. http://p.sf.net/sfu/p
_______________________________________________
Gramps-devel mailing list
[hidden email]
https://lists.sourceforge.net/lists/listinfo/gramps-devel
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

Re: Sort mystery

Peter Landgren

Den Monday 20 April 2009 09.41.34 skrev Kees Bakker:

> On Sunday 19 April 2009, Serge Noiraud wrote:

> > > > 3 : There is a problem with the following statement:

> > > >

> > > > if letter is not last_letter:

> >

> > I think the "if" was the problem at line 1528.

> > Why ? I don't know.

> > My problem was between the firt letter of Noiraud and noiraud.

> > Theorically, with the upper method applied to the first character,

> > "N".upper() and "n".upper() are the same.

> > but "if letter is not last_letter:" doesn't work in this case.

> > I replace "is not" by "!=" and it works !

> >

> > If someone can explain that !

>

> This is basic Python. I'll have to lookup the section in the manual. In

> general you have to understand that "is not" is really very different from

> "!=".

>

> There has been a tendency to replace "!=" by "is not" to improve the speed

> of Gramps. But that optimization has to be done with great care. Not only

> is it sometimes wrong (as shown above), but many times it has no influence

> on the speed of Gramps.

Yes,

The "!= " tests if the values are unequal while

"is not" tests if the involved objects are the same, i. e. refer to the same memory address.

/Peter


------------------------------------------------------------------------------
Stay on top of everything new and different, both inside and
around Java (TM) technology - register by April 22, and save
$200 on the JavaOne (SM) conference, June 2-5, 2009, San Francisco.
300 plus technical and hands-on sessions. Register today.
Use priority code J9JMT32. http://p.sf.net/sfu/p
_______________________________________________
Gramps-devel mailing list
[hidden email]
https://lists.sourceforge.net/lists/listinfo/gramps-devel
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

Re: Sort mystery

Serge Noiraud-2
Hi, everybody.

Thanks for the info on "is not" and "!=".

If you think the problem is solved, I can commit the changes in 3.1 and trunk
There is no bug report for that and we could forget.

Peter Landgren a écrit :

> Den Monday 20 April 2009 09.41.34 skrev Kees Bakker:
>
>  > On Sunday 19 April 2009, Serge Noiraud wrote:
>
>  > > > > 3 : There is a problem with the following statement:
>
>  > > > >
>
>  > > > > if letter is not last_letter:
>
> > >
> > > I think the "if" was the problem at line 1528.
> > > Why ? I don't know.
> > > My problem was between the firt letter of Noiraud and noiraud.
> > > Theorically, with the upper method applied to the first character,
> > > "N".upper() and "n".upper() are the same.
> > > but "if letter is not last_letter:" doesn't work in this case.
> > > I replace "is not" by "!=" and it works !
> > >
> > > If someone can explain that !
> >
> > This is basic Python. I'll have to lookup the section in the manual. In
> > general you have to understand that "is not" is really very different from
> > "!=".
> >
> > There has been a tendency to replace "!=" by "is not" to improve the speed
> > of Gramps. But that optimization has to be done with great care. Not only
> > is it sometimes wrong (as shown above), but many times it has no influence
> > on the speed of Gramps.
> Yes,
> The "!= " tests if the values are unequal while
> "is not" tests if the involved objects are the same, i. e. refer to the
> same memory address.
>
> /Peter

------------------------------------------------------------------------------
Stay on top of everything new and different, both inside and
around Java (TM) technology - register by April 22, and save
$200 on the JavaOne (SM) conference, June 2-5, 2009, San Francisco.
300 plus technical and hands-on sessions. Register today.
Use priority code J9JMT32. http://p.sf.net/sfu/p
_______________________________________________
Gramps-devel mailing list
[hidden email]
https://lists.sourceforge.net/lists/listinfo/gramps-devel
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

Re: Sort mystery

Benny Malengier


2009/4/20 Serge Noiraud <[hidden email]>
Hi, everybody.

Thanks for the info on "is not" and "!=".

If you think the problem is solved, I can commit the changes in 3.1 and trunk
There is no bug report for that and we could forget.

Please do. Add a comment above the != that != must be used and not is not, or some other developer might mistakenly change it back to is not.
It would be good to add a bug item, and add it to 3.1.2, then fix it, so we have a good overview on the roadmap of 3.1.2 of when to do a new release.

Benny.
 

Peter Landgren a écrit :
> Den Monday 20 April 2009 09.41.34 skrev Kees Bakker:
>
>  > On Sunday 19 April 2009, Serge Noiraud wrote:
>
>  > > > > 3 : There is a problem with the following statement:
>
>  > > > >
>
>  > > > > if letter is not last_letter:
>
> > >
> > > I think the "if" was the problem at line 1528.
> > > Why ? I don't know.
> > > My problem was between the firt letter of Noiraud and noiraud.
> > > Theorically, with the upper method applied to the first character,
> > > "N".upper() and "n".upper() are the same.
> > > but "if letter is not last_letter:" doesn't work in this case.
> > > I replace "is not" by "!=" and it works !
> > >
> > > If someone can explain that !
> >
> > This is basic Python. I'll have to lookup the section in the manual. In
> > general you have to understand that "is not" is really very different from
> > "!=".
> >
> > There has been a tendency to replace "!=" by "is not" to improve the speed
> > of Gramps. But that optimization has to be done with great care. Not only
> > is it sometimes wrong (as shown above), but many times it has no influence
> > on the speed of Gramps.
> Yes,
> The "!= " tests if the values are unequal while
> "is not" tests if the involved objects are the same, i. e. refer to the
> same memory address.
>
> /Peter

------------------------------------------------------------------------------
Stay on top of everything new and different, both inside and
around Java (TM) technology - register by April 22, and save
$200 on the JavaOne (SM) conference, June 2-5, 2009, San Francisco.
300 plus technical and hands-on sessions. Register today.
Use priority code J9JMT32. http://p.sf.net/sfu/p
_______________________________________________
Gramps-devel mailing list
[hidden email]
https://lists.sourceforge.net/lists/listinfo/gramps-devel


------------------------------------------------------------------------------
Crystal Reports &#45; New Free Runtime and 30 Day Trial
Check out the new simplified licensign option that enables unlimited
royalty&#45;free distribution of the report engine for externally facing
server and web deployment.
http://p.sf.net/sfu/businessobjects
_______________________________________________
Gramps-devel mailing list
[hidden email]
https://lists.sourceforge.net/lists/listinfo/gramps-devel
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

Re: Sort mystery

Kees Bakker
On Friday 24 April 2009, Benny Malengier wrote:

> 2009/4/20 Serge Noiraud <[hidden email]>
>
> > Hi, everybody.
> >
> > Thanks for the info on "is not" and "!=".
> >
> > If you think the problem is solved, I can commit the changes in 3.1 and
> > trunk
> > There is no bug report for that and we could forget.
> >
>
> Please do. Add a comment above the != that != must be used and not is not,
> or some other developer might mistakenly change it back to is not.

I don't agree with that comment. I hope you're not suggesting that we look up
all occurences of "!=" and mark them as "yes, we really want this, don't change
it into 'is not'".

But I do think we should make every developer aware that "is not" is different
from "!=".
--
Kees

------------------------------------------------------------------------------
Crystal Reports &#45; New Free Runtime and 30 Day Trial
Check out the new simplified licensign option that enables unlimited
royalty&#45;free distribution of the report engine for externally facing
server and web deployment.
http://p.sf.net/sfu/businessobjects
_______________________________________________
Gramps-devel mailing list
[hidden email]
https://lists.sourceforge.net/lists/listinfo/gramps-devel
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

Re: Sort mystery

Benny Malengier


2009/4/24 Kees Bakker <[hidden email]>
On Friday 24 April 2009, Benny Malengier wrote:
> 2009/4/20 Serge Noiraud <[hidden email]>
>
> > Hi, everybody.
> >
> > Thanks for the info on "is not" and "!=".
> >
> > If you think the problem is solved, I can commit the changes in 3.1 and
> > trunk
> > There is no bug report for that and we could forget.
> >
>
> Please do. Add a comment above the != that != must be used and not is not,
> or some other developer might mistakenly change it back to is not.

I don't agree with that comment. I hope you're not suggesting that we look up
all occurences of "!=" and mark them as "yes, we really want this, don't change
it into 'is not'".

But I do think we should make every developer aware that "is not" is different
from "!=".

Ok, then svn annotate the code and write the person who changed it. 
I meant that as it appears to have been changed, it might be changed again. Writing the person directly would indeed be the better thing to do.

Benny


------------------------------------------------------------------------------
Crystal Reports &#45; New Free Runtime and 30 Day Trial
Check out the new simplified licensign option that enables unlimited
royalty&#45;free distribution of the report engine for externally facing
server and web deployment.
http://p.sf.net/sfu/businessobjects
_______________________________________________
Gramps-devel mailing list
[hidden email]
https://lists.sourceforge.net/lists/listinfo/gramps-devel
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

Re: Sort mystery

Peter Landgren

Benny,

Do you have any idea if it's possible to fix the problem, that started this thread?

/Peter


------------------------------------------------------------------------------
Crystal Reports &#45; New Free Runtime and 30 Day Trial
Check out the new simplified licensign option that enables unlimited
royalty&#45;free distribution of the report engine for externally facing
server and web deployment.
http://p.sf.net/sfu/businessobjects
_______________________________________________
Gramps-devel mailing list
[hidden email]
https://lists.sourceforge.net/lists/listinfo/gramps-devel
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

Re: Sort mystery

Benny Malengier


2009/4/24 Peter Landgren <[hidden email]>

Benny,

Do you have any idea if it's possible to fix the problem, that started this thread?

/Peter


If your locale wants V and W to be together, we need to respect that.
So the issue is only that the first letter is shown every time again, no? In other words, the error is in the sort on narrative web and the way the first letter is used for quick jump into the list of names.

That indicates that the procedure that adds these first letters must be made more clever:

1/ If symbols are different but equal in the sort of the locale, consider them as one group. I guess doing sort of va and wb then vb and wa would indicate that v and w are one group, so the logic for a small function is not difficult.

2/ sorting of caps and small caps eg v and V. Is this not locale related too? If not, we should capitalize before sorting. I would assume the locale sort takes this into account however, would have to check

Benny

------------------------------------------------------------------------------
Crystal Reports &#45; New Free Runtime and 30 Day Trial
Check out the new simplified licensign option that enables unlimited
royalty&#45;free distribution of the report engine for externally facing
server and web deployment.
http://p.sf.net/sfu/businessobjects
_______________________________________________
Gramps-devel mailing list
[hidden email]
https://lists.sourceforge.net/lists/listinfo/gramps-devel
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

Re: Sort mystery

Peter Landgren

> 2009/4/24 Peter Landgren <[hidden email]>

>

> > Benny,

> >

> > Do you have any idea if it's possible to fix the problem, that started

> > this thread?

> >

> > /Peter

>

> If your locale wants V and W to be together, we need to respect that.

Yes, that's OK, as it's the way it's done in the phone book.

> So the issue is only that the first letter is shown every time again, no?

> In other words, the error is in the sort on narrative web and the way the

> first letter is used for quick jump into the list of names.

Yes. I would like to treat V and W as equal, as it's done in the sort.

> That indicates that the procedure that adds these first letters must be

> made more clever:

So the first letter should be V,W for all surnames/places beginning with V or W.

Both V and W in the alphabet, at top of the page, should point to V,W.

And this would only be so for Swedish locale?

For example, in English and many other languages, V and W are regarded as different letters.

"W" is not used in Swedish, except in Swedish names to make them look nicer. : )

> 1/ If symbols are different but equal in the sort of the locale, consider

> them as one group. I guess doing sort of va and wb then vb and wa would

> indicate that v and w are one group, so the logic for a small function is

> not difficult.

I don't follow you here.

> 2/ sorting of caps and small caps eg v and V. Is this not locale related

> too? If not, we should capitalize before sorting. I would assume the locale

> sort takes this into account however, would have to check

This is OK now. Names with prefix "von" come in the correct order.

Isn't his as problem with others letters in other locales?

/Peter


------------------------------------------------------------------------------
Crystal Reports &#45; New Free Runtime and 30 Day Trial
Check out the new simplified licensign option that enables unlimited
royalty&#45;free distribution of the report engine for externally facing
server and web deployment.
http://p.sf.net/sfu/businessobjects
_______________________________________________
Gramps-devel mailing list
[hidden email]
https://lists.sourceforge.net/lists/listinfo/gramps-devel
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

Re: Sort mystery

Serge Noiraud-2
Hi, everybody

I commited the patch in the svn.
See : http://www.gramps-project.org/bugs/view.php?id=2933

I had the same problem in french and now it works correctly.
I can reverse the patch if you want to find another solution
but I continue to say : the problem is "is not" I replaced by !=
with "is not", we have several groups of the same letter.

Peter Landgren a écrit :

>  > 2009/4/24 Peter Landgren <[hidden email]>
>
>  >
>
>  > > Benny,
>
>  > >
>
>  > > Do you have any idea if it's possible to fix the problem, that started
>
>  > > this thread?
>
>  > >
>
>  > > /Peter
>
>  >
>
>  > If your locale wants V and W to be together, we need to respect that.
>
> Yes, that's OK, as it's the way it's done in the phone book.
>
>  > So the issue is only that the first letter is shown every time again, no?
>
>  > In other words, the error is in the sort on narrative web and the way the
>
>  > first letter is used for quick jump into the list of names.
>
> Yes. I would like to treat V and W as equal, as it's done in the sort.
>
>  > That indicates that the procedure that adds these first letters must be
>
>  > made more clever:
>
> So the first letter should be V,W for all surnames/places beginning with
> V or W.
>
> Both V and W in the alphabet, at top of the page, should point to V,W.
>
> And this would only be so for Swedish locale?
>
> For example, in English and many other languages, V and W are regarded
> as different letters.
>
> "W" is not used in Swedish, except in Swedish names to make them look
> nicer. : )
>
>  > 1/ If symbols are different but equal in the sort of the locale, consider
>
>  > them as one group. I guess doing sort of va and wb then vb and wa would
>
>  > indicate that v and w are one group, so the logic for a small function is
>
>  > not difficult.
>
> I don't follow you here.
>
>  > 2/ sorting of caps and small caps eg v and V. Is this not locale related
>
>  > too? If not, we should capitalize before sorting. I would assume the
> locale
>
>  > sort takes this into account however, would have to check
>
> This is OK now. Names with prefix "von" come in the correct order.
>
> Isn't his as problem with others letters in other locales?
>
> /Peter
>

------------------------------------------------------------------------------
Crystal Reports &#45; New Free Runtime and 30 Day Trial
Check out the new simplified licensign option that enables unlimited
royalty&#45;free distribution of the report engine for externally facing
server and web deployment.
http://p.sf.net/sfu/businessobjects
_______________________________________________
Gramps-devel mailing list
[hidden email]
https://lists.sourceforge.net/lists/listinfo/gramps-devel
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

Re: Sort mystery

Tim Lyons
Administrator
In reply to this post by Benny Malengier
A challenge for searching for an answer!!


Benny Malengier wrote
2009/4/24 Peter Landgren <[hidden email]>
That indicates that the procedure that adds these first letters must be made
more clever:

1/ If symbols are different but equal in the sort of the locale, consider
them as one group. I guess doing sort of va and wb then vb and wa would
indicate that v and w are one group, so the logic for a small function is
not difficult.
You expand on this algorithm in http://www.gramps-project.org/bugs/view.php?id=2933#c9317. See http://www.unicode.org/charts/uca/ for the main collation chart with primary differences marked.

I entirely agree, and plan to implement this in NarWeb.

However, I have one problem.

I can't find out how to determine the letter that has a primary difference from the current letter (sorry, that's not quite the right wording, but I am not sure how to express it).

For example, I do a sort, and the first few names are "Ándre, Arnot", The algorithm shows that these should be grouped together. But which letter should be used for the index header. In this case, it should be "a" (or "A" if I upper case everything) as this is the letter from which "Á" and "A" have secondary or teriary differences.

In another language, "Á" might have a primary difference from "Z", and then the sort order would be  "Andrew, Arnot, Zulu, Ándre". In this case the index header should be "Á". So I can't just normalise the character to remove accents etc.

I have studied Unicode, CLDR and ICU and Googled extensively, but I can't find out how to determine the preceding primary character!

Can anyone help?
12
Loading...