GEDCOM validator does not like my file

classic Classic list List threaded Threaded
6 messages Options
Reply | Threaded
Open this post in threaded view
|

GEDCOM validator does not like my file

Bill Gee

Hello everyone -

 

After I saw the announcement yesterday regarding the release of GEDCOM standard version 5.5.5, I poked around and found a web site that will run validation across a GEDCOM file.

 

http://ged-inline.elasticbeanstalk.com/validate

 

Just to check it out, I exported my entire database as a GEDCOM, then submitted it to the validator. It produced over 6800 warnings. Ouch! Below is the first few dozen lines of the report.

 

Almost all of the complaints are saying that FILE content is over 30 characters. These are mostly URLs, so there is not much opportunity to make them shorter. Is this a problem in the specification? Or in the way Gramps creates a GEDCOM? Should I even worry about it? 30 characters seems way too short to me.

 

It is also complaining about empty ADDR lines. Yep, looking at my GEDCOM file I see over 10,000 ADDR lines that have no content. Every one is associated with a PLAC tag, and most also some combination of CITY, STAE and CTRY tags nearby. Is this something to be concerned about?

 

I see some lines like this:

 

Invalid content for FORM tag: 'docx' is not a valid <MULTIMEDIA_FORMAT>

 

It also notes that png, xslx and jpeg are not valid. What is valid?

 

I am using Gramps 5.1.1 running under 64-bit Fedora 30.

--

Bill Gee

 

=================

Validation report for GeeFamily2.ged

 

 

Generated by Gramps

Submitted by Bill Gee

Encoding UTF-8

GEDCOM version in file 5.5.1

GEDCOM version assumed 5.5.1

 

Analysis time 23 seconds to analyse the file (excluding upload time)

Speed 425 records per second

 

Lines 306151 Number of lines in the GEDCOM file

Records 9795 Number of records

Warnings 6865 Total number of warning messages

User-defined 1728 Number of lines with user-defined tags

 

Individuals 5623 Number of individuals in the GEDCOM file

Males 3009 Number of males

Females 2612 Number of females

Unknown 2

 

Families 2038 Number of families

Marriages 1284 Number of marriages

Places 11086 Number of places mentioned (not necessarily unique)

Source records 94 Number of source records

 

*** Line 28: Invalid content for ADDR tag: '' missing value for <ADDRESS_LINE>

*** Line 42: Invalid content for ADDR tag: '' missing value for <ADDRESS_LINE>

*** Line 62: Tag FORM is not allowed under OBJE

*** Line 64: Invalid content for FILE tag: 'https://www.familysearch.org/tree/person/L11D-FKX' is more than 30 characters, the maximum length for <MULTIMEDIA_FILE_REFERENCE>

*** Line 64: Mandatory tag FORM not found under FILE

*** Line 78: Invalid content for ADDR tag: '' missing value for <ADDRESS_LINE>

*** Line 84: Invalid content for ADDR tag: '' missing value for <ADDRESS_LINE>

*** Line 92: Invalid content for ADDR tag: '' missing value for <ADDRESS_LINE>

*** Line 119: Tag FORM is not allowed under OBJE

 



--
Gramps-users mailing list
[hidden email]
https://lists.sourceforge.net/lists/listinfo/gramps-users
https://gramps-project.org
Reply | Threaded
Open this post in threaded view
|

Re: GEDCOM validator does not like my file

GRAMPS - User mailing list
First of all, you can find the GEDCOM specifications here:

https://www.gedcom.org/gedcom.html

According to the 5.5.1 spec, File indeed can only be up to 30 characters in length. I don't know if FILE is suppose to take an URL. In the 5.5.5 spec. I found that an URL can be up to 2047 characters in length.

Empty lines are not allowed in 5.5.5.

The <MULTIMEDIA_FORMAT> only has a very limited number of allowed tag, and indeed 'docx' and 'jpeg' are not among them.

The problem is that the 5.5.1 spec is 20 years old and very outdated in these respects.

The 5.5.5 spec seems to be much better, and better defined (after a very quick read), but the authors point to several things that may need improvement in new versions.




Bill Gee wrote:

Hello everyone -

 

After I saw the announcement yesterday regarding the release of GEDCOM standard version 5.5.5, I poked around and found a web site that will run validation across a GEDCOM file.

 

http://ged-inline.elasticbeanstalk.com/validate

 

Just to check it out, I exported my entire database as a GEDCOM, then submitted it to the validator. It produced over 6800 warnings. Ouch! Below is the first few dozen lines of the report.

 

Almost all of the complaints are saying that FILE content is over 30 characters. These are mostly URLs, so there is not much opportunity to make them shorter. Is this a problem in the specification? Or in the way Gramps creates a GEDCOM? Should I even worry about it? 30 characters seems way too short to me.

 

It is also complaining about empty ADDR lines. Yep, looking at my GEDCOM file I see over 10,000 ADDR lines that have no content. Every one is associated with a PLAC tag, and most also some combination of CITY, STAE and CTRY tags nearby. Is this something to be concerned about?

 

I see some lines like this:

 

Invalid content for FORM tag: 'docx' is not a valid <MULTIMEDIA_FORMAT>

 

It also notes that png, xslx and jpeg are not valid. What is valid?

 

I am using Gramps 5.1.1 running under 64-bit Fedora 30. --

Bill Gee

 

=================

Validation report for GeeFamily2.ged

 

 

Generated by Gramps

Submitted by Bill Gee

Encoding UTF-8

GEDCOM version in file 5.5.1

GEDCOM version assumed 5.5.1

 

Analysis time 23 seconds to analyse the file (excluding upload time)

Speed 425 records per second

 

Lines 306151 Number of lines in the GEDCOM file

Records 9795 Number of records

Warnings 6865 Total number of warning messages

User-defined 1728 Number of lines with user-defined tags

 

Individuals 5623 Number of individuals in the GEDCOM file

Males 3009 Number of males

Females 2612 Number of females

Unknown 2

 

Families 2038 Number of families

Marriages 1284 Number of marriages

Places 11086 Number of places mentioned (not necessarily unique)

Source records 94 Number of source records

 

*** Line 28: Invalid content for ADDR tag: '' missing value for <ADDRESS_LINE>

*** Line 42: Invalid content for ADDR tag: '' missing value for <ADDRESS_LINE>

*** Line 62: Tag FORM is not allowed under OBJE

*** Line 64: Invalid content for FILE tag: 'https://www.familysearch.org/tree/person/L11D-FKX' is more than 30 characters, the maximum length for <MULTIMEDIA_FILE_REFERENCE>

*** Line 64: Mandatory tag FORM not found under FILE

*** Line 78: Invalid content for ADDR tag: '' missing value for <ADDRESS_LINE>

*** Line 84: Invalid content for ADDR tag: '' missing value for <ADDRESS_LINE>

*** Line 92: Invalid content for ADDR tag: '' missing value for <ADDRESS_LINE>

*** Line 119: Tag FORM is not allowed under OBJE

 






--
Gramps-users mailing list
[hidden email]
https://lists.sourceforge.net/lists/listinfo/gramps-users
https://gramps-project.org
Reply | Threaded
Open this post in threaded view
|

Re: GEDCOM validator does not like my file

Bill Gee

Hmmm.... The standard against which the validation took place was 5.5.1.

 

I do not see any empty lines in my file. There are lines with a tag that has no value - but that is not an empty line!

 

Reading the 5.5.1 spec - It appears that most tags must have a value. That includes the ADDR tags that appear all over the file I exported. Why does Gramps produce ADDR tags that have no value?

 

I see that 5.5.1 has a tag called WWW (page 173). Why are URLs exported as FILE instead of WWW?

 

The annotated 5.5.1 specification document states that pathnames in FILE tags should allow up to 259 characters. The limit of 30 was a mistake in the original spec. (page 108)

 

Page 110 of the annotated 5.5.1 spec talks about JPG vs. JPEG. It looks to me like the validator does not properly handle this. Page 109 acknowledges that there are missing formats.


--

Bill Gee

 

 


On Wednesday, October 9, 2019 12:25:16 PM CDT Dirk Munk via Gramps-users wrote:

> First of all, you can find the GEDCOM specifications here:

>

> https://www.gedcom.org/gedcom.html

>

> According to the 5.5.1 spec, File indeed can only be up to 30 characters

> in length. I don't know if FILE is suppose to take an URL. In the 5.5.5

> spec. I found that an URL can be up to 2047 characters in length.

>

> Empty lines are not allowed in 5.5.5.

>

> The <MULTIMEDIA_FORMAT> only has a very limited number of allowed tag,

> and indeed 'docx' and 'jpeg' are not among them.

>

> The problem is that the 5.5.1 spec is 20 years old and very outdated in

> these respects.

>

> The 5.5.5 spec seems to be much better, and better defined (after a very

> quick read), but the authors point to several things that may need

> improvement in new versions.

>

>

>

>

> Bill Gee wrote:

> >

> > Hello everyone -

> >

> > After I saw the announcement yesterday regarding the release of GEDCOM

> > standard version 5.5.5, I poked around and found a web site that will

> > run validation across a GEDCOM file.

> >

> > http://ged-inline.elasticbeanstalk.com/validate

> >

> > Just to check it out, I exported my entire database as a GEDCOM, then

> > submitted it to the validator. It produced over 6800 warnings. Ouch!

> > Below is the first few dozen lines of the report.

> >

> > Almost all of the complaints are saying that FILE content is over 30

> > characters. These are mostly URLs, so there is not much opportunity to

> > make them shorter. Is this a problem in the specification? Or in the

> > way Gramps creates a GEDCOM? Should I even worry about it? 30

> > characters seems way too short to me.

> >

> > It is also complaining about empty ADDR lines. Yep, looking at my

> > GEDCOM file I see over 10,000 ADDR lines that have no content. Every

> > one is associated with a PLAC tag, and most also some combination of

> > CITY, STAE and CTRY tags nearby. Is this something to be concerned about?

> >

> > I see some lines like this:

> >

> > Invalid content for FORM tag: 'docx' is not a valid <MULTIMEDIA_FORMAT>

> >

> > It also notes that png, xslx and jpeg are not valid. What is valid?

> >

> > I am using Gramps 5.1.1 running under 64-bit Fedora 30. --

> >

> > Bill Gee

> >

> > =================

> >

> > Validation report for GeeFamily2.ged

> >

> > Generated by Gramps

> >

> > Submitted by Bill Gee

> >

> > Encoding UTF-8

> >

> > GEDCOM version in file 5.5.1

> >

> > GEDCOM version assumed 5.5.1

> >

> > Analysis time 23 seconds to analyse the file (excluding upload time)

> >

> > Speed 425 records per second

> >

> > Lines 306151 Number of lines in the GEDCOM file

> >

> > Records 9795 Number of records

> >

> > Warnings 6865 Total number of warning messages

> >

> > User-defined 1728 Number of lines with user-defined tags

> >

> > Individuals 5623 Number of individuals in the GEDCOM file

> >

> > Males 3009 Number of males

> >

> > Females 2612 Number of females

> >

> > Unknown 2

> >

> > Families 2038 Number of families

> >

> > Marriages 1284 Number of marriages

> >

> > Places 11086 Number of places mentioned (not necessarily unique)

> >

> > Source records 94 Number of source records

> >

> > *** Line 28: Invalid content for ADDR tag: '' missing value for

> > <ADDRESS_LINE>

> >

> > *** Line 42: Invalid content for ADDR tag: '' missing value for

> > <ADDRESS_LINE>

> >

> > *** Line 62: Tag FORM is not allowed under OBJE

> >

> > *** Line 64: Invalid content for FILE tag:

> > 'https://www.familysearch.org/tree/person/L11D-FKX' is more than 30

> > characters, the maximum length for <MULTIMEDIA_FILE_REFERENCE>

> >

> > *** Line 64: Mandatory tag FORM not found under FILE

> >

> > *** Line 78: Invalid content for ADDR tag: '' missing value for

> > <ADDRESS_LINE>

> >

> > *** Line 84: Invalid content for ADDR tag: '' missing value for

> > <ADDRESS_LINE>

> >

> > *** Line 92: Invalid content for ADDR tag: '' missing value for

> > <ADDRESS_LINE>

> >

> > *** Line 119: Tag FORM is not allowed under OBJE

> >

> >

> >

>

>

 

 



--
Gramps-users mailing list
[hidden email]
https://lists.sourceforge.net/lists/listinfo/gramps-users
https://gramps-project.org
Reply | Threaded
Open this post in threaded view
|

Re: GEDCOM validator does not like my file

Nick Hall
On 09/10/2019 20:44, Bill Gee wrote:

Reading the 5.5.1 spec - It appears that most tags must have a value. That includes the ADDR tags that appear all over the file I exported. Why does Gramps produce ADDR tags that have no value?


Possibly a bug.  I would expect the street part of a place to be exported in the ADDR tag.  I'm not sure why it is exported if the place doesn't contain a street.


 

I see that 5.5.1 has a tag called WWW (page 173). Why are URLs exported as FILE instead of WWW?


The FILE tag should contain the path of a media object which may be an url in Gramps.


Nick.




--
Gramps-users mailing list
[hidden email]
https://lists.sourceforge.net/lists/listinfo/gramps-users
https://gramps-project.org
Reply | Threaded
Open this post in threaded view
|

Re: GEDCOM validator does not like my file

enno
Op 09-10-19 om 21:57 schreef Nick Hall:
On 09/10/2019 20:44, Bill Gee wrote:

Reading the 5.5.1 spec - It appears that most tags must have a value. That includes the ADDR tags that appear all over the file I exported. Why does Gramps produce ADDR tags that have no value?

Possibly a bug.  I would expect the street part of a place to be exported in the ADDR tag.  I'm not sure why it is exported if the place doesn't contain a street.

As far as I can see, there is no need to export an ADDR, unless the user has specified one, meaning that exporting an ADDR with every PLAC is just wrong. Both ADDR and PLAC are optional.

Enno




--
Gramps-users mailing list
[hidden email]
https://lists.sourceforge.net/lists/listinfo/gramps-users
https://gramps-project.org
Reply | Threaded
Open this post in threaded view
|

Re: GEDCOM validator does not like my file

prculley
The PLAC/ADDR code is one area where Gramps export is a bit behind the times.  It dates back to much earlier versions of GEDCOM.  This is one area that was addressed in the Enhanced Places GEPS, which should be available someday.

The OBJE.FORM export was a GEDCOM 5.5 structure; it probably should be updated to meet the 5.5.1 specs, although it is very likely acceptable to most importers (the validators, would of course complain).

And finally, the URL in a FILE will depend on where/how it is used.  If the tree used Media with a URL in the filename field, then this is normal.  If the tree has URLs in the Internet tab of Persons etc. then if it was tagged as "Web Home" it would get a WWW tag, otherwise it would get the older OBJE.FILE.FORM URL tag that the validator is complaining about.  As noted in several places (like the GEDCOM 5.5.5 spec), the FORM list on OBJE is seriously dated in the GEDCOM 5.5.1 spec, and missing many formats, as a result, Gramps (and others) use this field for a lot of "mime" types and for "URL".  This may be not completely valid, but avoids some of the potential for data loss.

Paul C.

On Wed, Oct 9, 2019 at 3:19 PM Enno Borgsteede <[hidden email]> wrote:
Op 09-10-19 om 21:57 schreef Nick Hall:
On 09/10/2019 20:44, Bill Gee wrote:

Reading the 5.5.1 spec - It appears that most tags must have a value. That includes the ADDR tags that appear all over the file I exported. Why does Gramps produce ADDR tags that have no value?

Possibly a bug.  I would expect the street part of a place to be exported in the ADDR tag.  I'm not sure why it is exported if the place doesn't contain a street.

As far as I can see, there is no need to export an ADDR, unless the user has specified one, meaning that exporting an ADDR with every PLAC is just wrong. Both ADDR and PLAC are optional.

Enno


--
Gramps-users mailing list
[hidden email]
https://lists.sourceforge.net/lists/listinfo/gramps-users
https://gramps-project.org


--
Gramps-users mailing list
[hidden email]
https://lists.sourceforge.net/lists/listinfo/gramps-users
https://gramps-project.org