image or PDF

classic Classic list List threaded Threaded
18 messages Options
Reply | Threaded
Open this post in threaded view
|

image or PDF

Gazza
I am just starting to use Gramps and checking out the features. One
question I have is what format should I use when scanning images and
also single page documents such as a Birth Certificate to add to a
Gallery. I notice that a thumbnail and preview are available with a jpg
file but just a generic thumbnail and no preview for a pdf. I will need
to scan multi-page documents as a multi page pdf but should I scan all
single pages as jpg or as pdf or should I only scan pictures and photos
as jpg and only documents as pdf?

Cheers
Garry
Gramps 4.1.2-1
Linux Mint 17.1

------------------------------------------------------------------------------
Dive into the World of Parallel Programming The Go Parallel Website, sponsored
by Intel and developed in partnership with Slashdot Media, is your hub for all
things parallel software development, from weekly thought leadership blogs to
news, videos, case studies, tutorials and more. Take a look and join the
conversation now. http://goparallel.sourceforge.net/
_______________________________________________
Gramps-users mailing list
[hidden email]
https://lists.sourceforge.net/lists/listinfo/gramps-users
Reply | Threaded
Open this post in threaded view
|

Re: image or PDF

Ron Johnson
On 03/29/2015 01:06 AM, Garry Seeley wrote:
> or should I only scan pictures and photos as jpg and only documents as pdf?

That's tempting, if for no other reason than the Joint *Photographic*
Experts Group tuned the jpg file format for photographs, and the Portable
*Document* Format allows you to embed text in the document.

However such PDF scanning requires a link to quite powerful OCR, which Linux
doesn't have; otherwise, you just get a jpg wrapped in pdf, and that's a
waste.  So, you might want to scan documents as high quality TIFF files, and
then if good OCR for Linux ever appears, you can run the tiffs through them.

--
My word, man!  Don't you know your quantum statistics?


------------------------------------------------------------------------------
Dive into the World of Parallel Programming The Go Parallel Website, sponsored
by Intel and developed in partnership with Slashdot Media, is your hub for all
things parallel software development, from weekly thought leadership blogs to
news, videos, case studies, tutorials and more. Take a look and join the
conversation now. http://goparallel.sourceforge.net/
_______________________________________________
Gramps-users mailing list
[hidden email]
https://lists.sourceforge.net/lists/listinfo/gramps-users
Reply | Threaded
Open this post in threaded view
|

Re: image or PDF

Philip Weiss
In reply to this post by Gazza
I highly recommend scanning as a TIFF files, as those are generally lossless (meaning information isn't lost in the image compression formula).  JPEG is a "lossy" format, meaning information is lost in the built in image compression.

I will occasionally scan documents into PDF files (I've found VueScan software to have adequate OCR), but mostly just for the ease of stitching multiple pages together.

Gramps specific, most scans of documents and certificates and newspapers and the like get attached to citations in my tree, rather than directly to a person, and the citations support various claims about a person and his/her events.  About the only images I attach directly to people are photos and paintings of them.  You don't *have* to do that, but I've found it's easier to track images to the citations and sources that way.  And it's a pain in the butt to go back and switch to that later on.

Phil

On Sat, Mar 28, 2015 at 11:06 PM, Garry Seeley <[hidden email]> wrote:
I am just starting to use Gramps and checking out the features. One
question I have is what format should I use when scanning images and
also single page documents such as a Birth Certificate to add to a
Gallery. I notice that a thumbnail and preview are available with a jpg
file but just a generic thumbnail and no preview for a pdf. I will need
to scan multi-page documents as a multi page pdf but should I scan all
single pages as jpg or as pdf or should I only scan pictures and photos
as jpg and only documents as pdf?

Cheers
Garry
Gramps 4.1.2-1
Linux Mint 17.1

------------------------------------------------------------------------------
Dive into the World of Parallel Programming The Go Parallel Website, sponsored
by Intel and developed in partnership with Slashdot Media, is your hub for all
things parallel software development, from weekly thought leadership blogs to
news, videos, case studies, tutorials and more. Take a look and join the
conversation now. http://goparallel.sourceforge.net/
_______________________________________________
Gramps-users mailing list
[hidden email]
https://lists.sourceforge.net/lists/listinfo/gramps-users
Reply | Threaded
Open this post in threaded view
|

Re: image or PDF

Sebastian Schubert
In reply to this post by Gazza
Am 29.03.2015 um 08:06 schrieb Garry Seeley:
> I am just starting to use Gramps and checking out the features. One
> question I have is what format should I use when scanning images and
> also single page documents such as a Birth Certificate to add to a
> Gallery. I notice that a thumbnail and preview are available with a jpg
> file but just a generic thumbnail and no preview for a pdf. I will need
> to scan multi-page documents as a multi page pdf but should I scan all
> single pages as jpg or as pdf or should I only scan pictures and photos
> as jpg and only documents as pdf?

As the others also pointed out, without OCR, pdf is just a useless
wrapper around the image file. If you want to open the file in an image
editing application, this wrapper has to be removed, which I had issues
with sometimes... Also, I suppose OCR does not work with hand-written
documents, in particular German Kurrent, so I did not even tried that.

Consequently, I also use image formats for multi-page documents. As
Philip said, one can attach images to citations. Here, I want to attach
the very page that fits to the citation. With a single pdf, you can only
attach the complete document.

My workflow:

* Scan tiff.
* Convert tiff to png to save space (lossless)
* Import png into Lightroom for some tuning
* Output jpg at rather high quality

The jpg are used in gramps. While the last step is lossy, jpgs are much
smaller but still fine for me. They open faster. Also, since the files
are also used in the narrative web report, jpg are much better here.

I have the same workflow also for pictures. Naturally, the 3rd step is
more important here.

HTH
Sebastian

------------------------------------------------------------------------------
Dive into the World of Parallel Programming The Go Parallel Website, sponsored
by Intel and developed in partnership with Slashdot Media, is your hub for all
things parallel software development, from weekly thought leadership blogs to
news, videos, case studies, tutorials and more. Take a look and join the
conversation now. http://goparallel.sourceforge.net/
_______________________________________________
Gramps-users mailing list
[hidden email]
https://lists.sourceforge.net/lists/listinfo/gramps-users
Reply | Threaded
Open this post in threaded view
|

Re: image or PDF

Gazza
In reply to this post by Ron Johnson
I am using Simple Scan that comes with LM so my PDF scans are just jpg
embedded in a pdf file, not a true text pdf file. That makes sense.



On 29/03/15 14:25, Ron Johnson wrote:

> On 03/29/2015 01:06 AM, Garry Seeley wrote:
>> or should I only scan pictures and photos as jpg and only documents as pdf?
> That's tempting, if for no other reason than the Joint *Photographic*
> Experts Group tuned the jpg file format for photographs, and the Portable
> *Document* Format allows you to embed text in the document.
>
> However such PDF scanning requires a link to quite powerful OCR, which Linux
> doesn't have; otherwise, you just get a jpg wrapped in pdf, and that's a
> waste.  So, you might want to scan documents as high quality TIFF files, and
> then if good OCR for Linux ever appears, you can run the tiffs through them.
>


------------------------------------------------------------------------------
Dive into the World of Parallel Programming The Go Parallel Website, sponsored
by Intel and developed in partnership with Slashdot Media, is your hub for all
things parallel software development, from weekly thought leadership blogs to
news, videos, case studies, tutorials and more. Take a look and join the
conversation now. http://goparallel.sourceforge.net/
_______________________________________________
Gramps-users mailing list
[hidden email]
https://lists.sourceforge.net/lists/listinfo/gramps-users
Reply | Threaded
Open this post in threaded view
|

Re: image or PDF

Gazza
In reply to this post by Philip Weiss
Phil, Yes I ended up attaching the images/documents to the citations after some trial and error. It seemed to make more sense that way.

On 29/03/15 14:40, Philip Weiss wrote:
I highly recommend scanning as a TIFF files, as those are generally lossless (meaning information isn't lost in the image compression formula).  JPEG is a "lossy" format, meaning information is lost in the built in image compression.

I will occasionally scan documents into PDF files (I've found VueScan software to have adequate OCR), but mostly just for the ease of stitching multiple pages together.

Gramps specific, most scans of documents and certificates and newspapers and the like get attached to citations in my tree, rather than directly to a person, and the citations support various claims about a person and his/her events.  About the only images I attach directly to people are photos and paintings of them.  You don't *have* to do that, but I've found it's easier to track images to the citations and sources that way.  And it's a pain in the butt to go back and switch to that later on.

Phil

On Sat, Mar 28, 2015 at 11:06 PM, Garry Seeley <[hidden email]> wrote:
I am just starting to use Gramps and checking out the features. One
question I have is what format should I use when scanning images and
also single page documents such as a Birth Certificate to add to a
Gallery. I notice that a thumbnail and preview are available with a jpg
file but just a generic thumbnail and no preview for a pdf. I will need
to scan multi-page documents as a multi page pdf but should I scan all
single pages as jpg or as pdf or should I only scan pictures and photos
as jpg and only documents as pdf?

Cheers
Garry
Gramps 4.1.2-1
Linux Mint 17.1


------------------------------------------------------------------------------
Dive into the World of Parallel Programming The Go Parallel Website, sponsored
by Intel and developed in partnership with Slashdot Media, is your hub for all
things parallel software development, from weekly thought leadership blogs to
news, videos, case studies, tutorials and more. Take a look and join the 
conversation now. http://goparallel.sourceforge.net/


_______________________________________________
Gramps-users mailing list
[hidden email]
https://lists.sourceforge.net/lists/listinfo/gramps-users


------------------------------------------------------------------------------
Dive into the World of Parallel Programming The Go Parallel Website, sponsored
by Intel and developed in partnership with Slashdot Media, is your hub for all
things parallel software development, from weekly thought leadership blogs to
news, videos, case studies, tutorials and more. Take a look and join the
conversation now. http://goparallel.sourceforge.net/
_______________________________________________
Gramps-users mailing list
[hidden email]
https://lists.sourceforge.net/lists/listinfo/gramps-users
Reply | Threaded
Open this post in threaded view
|

Re: image or PDF

Paul Franklin-5
In reply to this post by Gazza
On 3/29/15, Garry Seeley <[hidden email]> wrote:
> ... I notice that a thumbnail and preview are available with a jpg
> file but just a generic thumbnail and no preview for a pdf. ...

https://gramps-project.org/bugs/view.php?id=8161

------------------------------------------------------------------------------
Dive into the World of Parallel Programming The Go Parallel Website, sponsored
by Intel and developed in partnership with Slashdot Media, is your hub for all
things parallel software development, from weekly thought leadership blogs to
news, videos, case studies, tutorials and more. Take a look and join the
conversation now. http://goparallel.sourceforge.net/
_______________________________________________
Gramps-users mailing list
[hidden email]
https://lists.sourceforge.net/lists/listinfo/gramps-users
Reply | Threaded
Open this post in threaded view
|

Re: image or PDF

Gazza
In reply to this post by Sebastian Schubert
I am tending to think the same as you Sebastian and I will go for jpg
(maybe after scanning and editing photos in some other format first).
But I do have some documents that are multi page that I would like to
attach complete. My options in that case would be to attach several jpg
files and name them something like ...page1/2.jpg, ...page2/2.jpg.



On 29/03/15 14:55, Sebastian Schubert wrote:

> Am 29.03.2015 um 08:06 schrieb Garry Seeley:
>> I am just starting to use Gramps and checking out the features. One
>> question I have is what format should I use when scanning images and
>> also single page documents such as a Birth Certificate to add to a
>> Gallery. I notice that a thumbnail and preview are available with a jpg
>> file but just a generic thumbnail and no preview for a pdf. I will need
>> to scan multi-page documents as a multi page pdf but should I scan all
>> single pages as jpg or as pdf or should I only scan pictures and photos
>> as jpg and only documents as pdf?
> As the others also pointed out, without OCR, pdf is just a useless
> wrapper around the image file. If you want to open the file in an image
> editing application, this wrapper has to be removed, which I had issues
> with sometimes... Also, I suppose OCR does not work with hand-written
> documents, in particular German Kurrent, so I did not even tried that.
>
> Consequently, I also use image formats for multi-page documents. As
> Philip said, one can attach images to citations. Here, I want to attach
> the very page that fits to the citation. With a single pdf, you can only
> attach the complete document.
>
> My workflow:
>
> * Scan tiff.
> * Convert tiff to png to save space (lossless)
> * Import png into Lightroom for some tuning
> * Output jpg at rather high quality
>
> The jpg are used in gramps. While the last step is lossy, jpgs are much
> smaller but still fine for me. They open faster. Also, since the files
> are also used in the narrative web report, jpg are much better here.
>
> I have the same workflow also for pictures. Naturally, the 3rd step is
> more important here.
>
> HTH
> Sebastian
>
> ------------------------------------------------------------------------------
> Dive into the World of Parallel Programming The Go Parallel Website, sponsored
> by Intel and developed in partnership with Slashdot Media, is your hub for all
> things parallel software development, from weekly thought leadership blogs to
> news, videos, case studies, tutorials and more. Take a look and join the
> conversation now. http://goparallel.sourceforge.net/
> _______________________________________________
> Gramps-users mailing list
> [hidden email]
> https://lists.sourceforge.net/lists/listinfo/gramps-users
>


------------------------------------------------------------------------------
Dive into the World of Parallel Programming The Go Parallel Website, sponsored
by Intel and developed in partnership with Slashdot Media, is your hub for all
things parallel software development, from weekly thought leadership blogs to
news, videos, case studies, tutorials and more. Take a look and join the
conversation now. http://goparallel.sourceforge.net/
_______________________________________________
Gramps-users mailing list
[hidden email]
https://lists.sourceforge.net/lists/listinfo/gramps-users
Reply | Threaded
Open this post in threaded view
|

Re : image or PDF

jerome
In reply to this post by Gazza
Hello,

I remember a workaround some years back for displaying thumbnails
with a snapshot of the OpenDocument content.

Sure, this should also work within Gramps under UNIX-like OS,
getting OS' thumbnails generated for pdf files. But this will be a
custom hack, nothing common and proper for all supported operating
systems.

Is there any common way for generating a preview of its PDF file?


J.
 

--------------------------------------------
En date de : Dim 29.3.15, Garry Seeley <[hidden email]> a écrit :

 Objet: [Gramps-users] image or PDF
 À: [hidden email]
 Date: Dimanche 29 mars 2015, 8h06
 
 I am just starting to use Gramps and
 checking out the features. One
 question I have is what format should I use when scanning
 images and
 also single page documents such as a Birth Certificate to
 add to a
 Gallery. I notice that a thumbnail and preview are available
 with a jpg
 file but just a generic thumbnail and no preview for a pdf.
 I will need
 to scan multi-page documents as a multi page pdf but should
 I scan all
 single pages as jpg or as pdf or should I only scan pictures
 and photos
 as jpg and only documents as pdf?
 
 Cheers
 Garry
 Gramps 4.1.2-1
 Linux Mint 17.1
 
 ------------------------------------------------------------------------------
 Dive into the World of Parallel Programming The Go Parallel
 Website, sponsored
 by Intel and developed in partnership with Slashdot Media,
 is your hub for all
 things parallel software development, from weekly thought
 leadership blogs to
 news, videos, case studies, tutorials and more. Take a look
 and join the
 conversation now. http://goparallel.sourceforge.net/
 _______________________________________________
 Gramps-users mailing list
 [hidden email]
 https://lists.sourceforge.net/lists/listinfo/gramps-users
 

------------------------------------------------------------------------------
Dive into the World of Parallel Programming The Go Parallel Website, sponsored
by Intel and developed in partnership with Slashdot Media, is your hub for all
things parallel software development, from weekly thought leadership blogs to
news, videos, case studies, tutorials and more. Take a look and join the
conversation now. http://goparallel.sourceforge.net/
_______________________________________________
Gramps-users mailing list
[hidden email]
https://lists.sourceforge.net/lists/listinfo/gramps-users
Reply | Threaded
Open this post in threaded view
|

Re: image or PDF

Peter (chamdo4ever)
In reply to this post by Ron Johnson
On Sun, Mar 29, 2015 at 2:25 AM, Ron Johnson <[hidden email]> wrote:
> However such PDF scanning requires a link to quite powerful OCR, which Linux
> doesn't have;

I could be wrong (and I don't have things setup to test it at the
moment), but I seem to recall getting decent OCR results on Ubuntu
using VueScan: http://www.hamrick.com

Yes, it is closed source and yes you need to pay for it, but VueScan
has served me incredibly well using some old but high resolution
scanners on Ubuntu. It's the one bit of closed source software that I
was happy to pay for and have had installed for years. I'm pretty sure
I did some OCR scans using it as well but I'm not 100% sure.

Peter

------------------------------------------------------------------------------
Dive into the World of Parallel Programming The Go Parallel Website, sponsored
by Intel and developed in partnership with Slashdot Media, is your hub for all
things parallel software development, from weekly thought leadership blogs to
news, videos, case studies, tutorials and more. Take a look and join the
conversation now. http://goparallel.sourceforge.net/
_______________________________________________
Gramps-users mailing list
[hidden email]
https://lists.sourceforge.net/lists/listinfo/gramps-users
Reply | Threaded
Open this post in threaded view
|

Re: image or PDF

jerome
About OCR, we made some quick tests in the past.
See https://gramps-project.org/wiki/index.php?title=OCR

You could find some python modules (GObject introspection?)
using these engines.

--------------------------------------------
En date de : Dim 29.3.15, Peter <[hidden email]> a écrit :

 Objet: Re: [Gramps-users] image or PDF
 À: "gramps-users" <[hidden email]>
 Date: Dimanche 29 mars 2015, 16h53
 
 On Sun, Mar 29, 2015 at
 2:25 AM, Ron Johnson <[hidden email]>
 wrote:
 > However such PDF scanning
 requires a link to quite powerful OCR, which Linux
 > doesn't have;
 
 I could be wrong (and I don't have things
 setup to test it at the
 moment), but I seem
 to recall getting decent OCR results on Ubuntu
 using VueScan: http://www.hamrick.com
 
 Yes, it is closed source and
 yes you need to pay for it, but VueScan
 has
 served me incredibly well using some old but high
 resolution
 scanners on Ubuntu. It's the
 one bit of closed source software that I
 was
 happy to pay for and have had installed for years. I'm
 pretty sure
 I did some OCR scans using it as
 well but I'm not 100% sure.
 
 Peter
 
 ------------------------------------------------------------------------------
 Dive into the World of Parallel Programming The
 Go Parallel Website, sponsored
 by Intel and
 developed in partnership with Slashdot Media, is your hub
 for all
 things parallel software
 development, from weekly thought leadership blogs to
 news, videos, case studies, tutorials and more.
 Take a look and join the
 conversation now.
 http://goparallel.sourceforge.net/
 _______________________________________________
 Gramps-users mailing list
 [hidden email]
 https://lists.sourceforge.net/lists/listinfo/gramps-users
 

------------------------------------------------------------------------------
Dive into the World of Parallel Programming The Go Parallel Website, sponsored
by Intel and developed in partnership with Slashdot Media, is your hub for all
things parallel software development, from weekly thought leadership blogs to
news, videos, case studies, tutorials and more. Take a look and join the
conversation now. http://goparallel.sourceforge.net/
_______________________________________________
Gramps-users mailing list
[hidden email]
https://lists.sourceforge.net/lists/listinfo/gramps-users
Reply | Threaded
Open this post in threaded view
|

Re: image or PDF

Kenneth Browne
In reply to this post by Gazza
On 03/29/2015 04:20 AM, Garry Seeley wrote:
> I am tending to think the same as you Sebastian and I will go for jpg
> (maybe after scanning and editing photos in some other format first).
> But I do have some documents that are multi page that I would like to
> attach complete. My options in that case would be to attach several jpg
> files and name them something like ...page1/2.jpg, ...page2/2.jpg.
I've done this in the past when using TMG v.8. In fact if I remember
correctly TMG choked
on a multipage PDF or simply wouldn't display any PDF files. OCR would
have been useless
anyway since many of the pages were handwritten affidavits concerning
my ggg gfather's
U.S. Revolutionary War pension. Since my "original" document was a
downloaded PDF I had
to center each page on my laptop screen and make a screenshot that I
saved as JPG.

Ken

------------------------------------------------------------------------------
Dive into the World of Parallel Programming The Go Parallel Website, sponsored
by Intel and developed in partnership with Slashdot Media, is your hub for all
things parallel software development, from weekly thought leadership blogs to
news, videos, case studies, tutorials and more. Take a look and join the
conversation now. http://goparallel.sourceforge.net/
_______________________________________________
Gramps-users mailing list
[hidden email]
https://lists.sourceforge.net/lists/listinfo/gramps-users
Reply | Threaded
Open this post in threaded view
|

Re: image or PDF

Josip
In reply to this post by Paul Franklin-5
29.3.2015. u 9:59, Paul Franklin je napisao/la:
> On 3/29/15, Garry Seeley <[hidden email]> wrote:
>> ... I notice that a thumbnail and preview are available with a jpg
>> file but just a generic thumbnail and no preview for a pdf. ...
>
> https://gramps-project.org/bugs/view.php?id=8161
>

We can handle both with poppler!


--
Josip

------------------------------------------------------------------------------
Dive into the World of Parallel Programming The Go Parallel Website, sponsored
by Intel and developed in partnership with Slashdot Media, is your hub for all
things parallel software development, from weekly thought leadership blogs to
news, videos, case studies, tutorials and more. Take a look and join the
conversation now. http://goparallel.sourceforge.net/
_______________________________________________
Gramps-users mailing list
[hidden email]
https://lists.sourceforge.net/lists/listinfo/gramps-users
Reply | Threaded
Open this post in threaded view
|

Re: image or PDF

Gazza
In reply to this post by Kenneth Browne
Yes I do not see the need to OCR and produce a true PDF document when I
am just attaching an image to a citation. I see that I am adding an
image as part of recording the details of a source/citation as further
prove that the source exists and I don't see that I would need an
editable document in that case. But, the discussion on OCR and producing
true PDF documents has been interesting and has helped me understand
more about PDF.

I have been using ImageMagick to extract images from PDF files and also
to concatenate those images into a single JPG file. Here are my notes:

Use ImageMagick to save PDF pages as separate images

Open the PDF file from ImageMagick
Save as format JPG ie 1.jpg and set quality to 100 when asked
File - Next to view next page
Save as format JPG ie 2.jpg and set quality to 100 when asked
etc for rest of pages


 From the terminal concatenate images

$ montage -mode concatenate 1.jpg 2.jpg .....  result.jpg

use the option -tile <columns>x<rows> to control the layout to be
applied. either side may be missing and montage will figure out how to
meet the constraints.
ie $ montage -mode concatenate -tile 1x ..........
will concatenate the images in a single column
ie $ montage -mode concatenate -tile x1 ..........
will concatenate the images in a single row

Cheers.



On 29/03/15 23:34, Kenneth Browne wrote:

> On 03/29/2015 04:20 AM, Garry Seeley wrote:
>> I am tending to think the same as you Sebastian and I will go for jpg
>> (maybe after scanning and editing photos in some other format first).
>> But I do have some documents that are multi page that I would like to
>> attach complete. My options in that case would be to attach several jpg
>> files and name them something like ...page1/2.jpg, ...page2/2.jpg.
> I've done this in the past when using TMG v.8. In fact if I remember
> correctly TMG choked
> on a multipage PDF or simply wouldn't display any PDF files. OCR would
> have been useless
> anyway since many of the pages were handwritten affidavits concerning
> my ggg gfather's
> U.S. Revolutionary War pension. Since my "original" document was a
> downloaded PDF I had
> to center each page on my laptop screen and make a screenshot that I
> saved as JPG.
>
> Ken
>


------------------------------------------------------------------------------
Dive into the World of Parallel Programming The Go Parallel Website, sponsored
by Intel and developed in partnership with Slashdot Media, is your hub for all
things parallel software development, from weekly thought leadership blogs to
news, videos, case studies, tutorials and more. Take a look and join the
conversation now. http://goparallel.sourceforge.net/
_______________________________________________
Gramps-users mailing list
[hidden email]
https://lists.sourceforge.net/lists/listinfo/gramps-users
Reply | Threaded
Open this post in threaded view
|

Re: image or PDF

paul womack
In reply to this post by Gazza
Garry Seeley wrote:
> I am just starting to use Gramps and checking out the features. One
> question I have is what format should I use when scanning images and
> also single page documents such as a Birth Certificate to add to a
> Gallery. I notice that a thumbnail and preview are available with a jpg
> file but just a generic thumbnail and no preview for a pdf. I will need
> to scan multi-page documents as a multi page pdf but should I scan all
> single pages as jpg or as pdf or should I only scan pictures and photos
> as jpg and only documents as pdf?

For photographs (loosely defined) I would use
JPEG. If you're worried about quality, use a high quality number
(e.g. 90 or more in Gimp). This, while technically "lossy",
doesn't lose anything you need (that's the whole point
of all the research the "Joint Photographic Experts Group" did).

In the case of multipage documents, the question is more difficult.
But multi page image formats are not widely supported, and PDF is.

I would therefore recommend using PDF. If your source is a large
PDF book (e.g. from google books), you can easily make
a new PDF containing just the pages you want (I tend to take the
frontspiece too). This can be done on Linux using pdftk.

OCR'ing scanned documents is likely to be tricky. In the case of a
book or newspaper you might do OK, but something like a form
(and we all deal with lots of forms...) the mixture of labels, data
and the complex layout will defeat most OCR.

Finally PDF is NOT (neccesarily...) just a pointless wrapper around
JPEG. PDF supports multiple image compressions and formats,
including the rather impressive JPEG2000 part 6 stuff
for "compound images". This is the technology Google
uses for Google books, and it's smart stuff.

If (one happy day) OCR emerges that is useful for this
kind of data, it can readily be retro-applied to your existing PDF.

  BugBear

------------------------------------------------------------------------------
Dive into the World of Parallel Programming The Go Parallel Website, sponsored
by Intel and developed in partnership with Slashdot Media, is your hub for all
things parallel software development, from weekly thought leadership blogs to
news, videos, case studies, tutorials and more. Take a look and join the
conversation now. http://goparallel.sourceforge.net/
_______________________________________________
Gramps-users mailing list
[hidden email]
https://lists.sourceforge.net/lists/listinfo/gramps-users
Reply | Threaded
Open this post in threaded view
|

Re: image or PDF

jerome
In reply to this post by jerome
Also, its seems that gramps4 might include a gramplet
(or whatever plugin) linked with 'ocrfeeder' [1][2]!

[1] https://wiki.gnome.org/Apps/OCRFeeder
[2] https://git.gnome.org/browse/ocrfeeder/


--------------------------------------------
En date de : Dim 29.3.15, jerome <[hidden email]> a écrit :

 Objet: Re: [Gramps-users] image or PDF
 À: "gramps-users" <[hidden email]>, "Peter" <[hidden email]>
 Date: Dimanche 29 mars 2015, 17h18
 
 About OCR, we made some
 quick tests in the past.
 See https://gramps-project.org/wiki/index.php?title=OCR
 
 You could find some python
 modules (GObject introspection?)
 using these
 engines.
 
 --------------------------------------------
 En date de : Dim 29.3.15, Peter <[hidden email]>
 a écrit :
 
  Objet: Re:
 [Gramps-users] image or PDF
  À:
 "gramps-users" <[hidden email]>
 
 Date: Dimanche 29 mars 2015, 16h53
 
  On Sun, Mar 29, 2015 at
  2:25
 AM, Ron Johnson <[hidden email]>
  wrote:
  > However such PDF
 scanning
  requires a link to quite powerful
 OCR, which Linux
  > doesn't have;
 
  I could be wrong (and I
 don't have things
  setup to test it at
 the
  moment), but I seem
  to
 recall getting decent OCR results on Ubuntu
 
 using VueScan: http://www.hamrick.com
 
 
  Yes, it is closed source and
  yes you need to pay for it, but VueScan
  has
  served me incredibly well
 using some old but high
  resolution
  scanners on Ubuntu. It's the
  one bit of closed source software that I
  was
  happy to pay for and have
 had installed for years. I'm
  pretty
 sure
  I did some OCR scans using it as
  well but I'm not 100% sure.
 
  Peter
 
 
 ------------------------------------------------------------------------------
  Dive into the World of Parallel Programming
 The
  Go Parallel Website, sponsored
  by Intel and
  developed in
 partnership with Slashdot Media, is your hub
  for all
  things parallel
 software
  development, from weekly thought
 leadership blogs to
  news, videos, case
 studies, tutorials and more.
  Take a look
 and join the
  conversation now.
  http://goparallel.sourceforge.net/
 
 _______________________________________________
  Gramps-users mailing list
  [hidden email]
  https://lists.sourceforge.net/lists/listinfo/gramps-users
 

------------------------------------------------------------------------------
Dive into the World of Parallel Programming The Go Parallel Website, sponsored
by Intel and developed in partnership with Slashdot Media, is your hub for all
things parallel software development, from weekly thought leadership blogs to
news, videos, case studies, tutorials and more. Take a look and join the
conversation now. http://goparallel.sourceforge.net/
_______________________________________________
Gramps-users mailing list
[hidden email]
https://lists.sourceforge.net/lists/listinfo/gramps-users
Reply | Threaded
Open this post in threaded view
|

Re: image or PDF

jerome
"Future

Work, life and other projects make it more and more difficult to find the time to work on OCRFeeder. I would nonetheless be happy to help anyone interested in contributing to it to give the first steps. I believe that OCRFeeder is a useful project and not only for accessibility purposes (although this is a great reason on its own!) so, if you like Python, GTK+, and want to help make this project better, drop me an email." ~August 5, 2014

http://www.joaquimrocha.com/2014/08/05/ocrfeeder-0-8-is-out/

Well, I suppose that gramps project could have a look at OCRFeeder?
Maybe to provide an addon could be also possible?


--------------------------------------------
En date de : Lun 30.3.15, jerome <[hidden email]> a écrit :

 Objet: Re: [Gramps-users] image or PDF
 À: "gramps-users" <[hidden email]>, "Peter" <[hidden email]>
 Date: Lundi 30 mars 2015, 12h09
 
 Also, its seems that
 gramps4 might include a gramplet
 (or
 whatever plugin) linked with 'ocrfeeder' [1][2]!
 
 [1] https://wiki.gnome.org/Apps/OCRFeeder
 [2] https://git.gnome.org/browse/ocrfeeder/
 
 
 --------------------------------------------
 En date de : Dim 29.3.15, jerome <[hidden email]>
 a écrit :
 
  Objet: Re:
 [Gramps-users] image or PDF
  À:
 "gramps-users" <[hidden email]>,
 "Peter" <[hidden email]>
 
 Date: Dimanche 29 mars 2015, 17h18
 
  About OCR, we made some
  quick
 tests in the past.
  See https://gramps-project.org/wiki/index.php?title=OCR
 
  You could find some
 python
  modules (GObject introspection?)
  using these
  engines.
 
 
 --------------------------------------------
  En date de : Dim 29.3.15, Peter <[hidden email]>
  a écrit :
 
   Objet: Re:
  [Gramps-users]
 image or PDF
   À:
 
 "gramps-users" <[hidden email]>
 
  Date: Dimanche 29 mars 2015,
 16h53
  
   On Sun, Mar 29,
 2015 at
   2:25
  AM, Ron
 Johnson <[hidden email]>
   wrote:
   > However such
 PDF
  scanning
   requires a
 link to quite powerful
  OCR, which Linux
   > doesn't have;
  
 
   I could be wrong (and I
 
 don't have things
   setup to test it
 at
  the
   moment), but I
 seem
   to
  recall getting
 decent OCR results on Ubuntu
 
  using VueScan: http://www.hamrick.com
 
 
 
   Yes, it is closed
 source and
   yes you need to pay for it,
 but VueScan
   has
   served
 me incredibly well
  using some old but
 high
   resolution
  
 scanners on Ubuntu. It's the
   one bit
 of closed source software that I
   was
   happy to pay for and have
 
 had installed for years. I'm
  
 pretty
  sure
   I did some
 OCR scans using it as
   well but I'm
 not 100% sure.
  
  
 Peter
  
 
 
 ------------------------------------------------------------------------------
   Dive into the World of Parallel
 Programming
  The
   Go
 Parallel Website, sponsored
   by Intel
 and
   developed in
 
 partnership with Slashdot Media, is your hub
   for all
   things
 parallel
  software
  
 development, from weekly thought
  leadership
 blogs to
   news, videos, case
  studies, tutorials and more.
   Take a look
  and join the
 
   conversation now.
   http://goparallel.sourceforge.net/
 
 
 _______________________________________________
   Gramps-users mailing list
  
 [hidden email]
   https://lists.sourceforge.net/lists/listinfo/gramps-users
 

------------------------------------------------------------------------------
Dive into the World of Parallel Programming The Go Parallel Website, sponsored
by Intel and developed in partnership with Slashdot Media, is your hub for all
things parallel software development, from weekly thought leadership blogs to
news, videos, case studies, tutorials and more. Take a look and join the
conversation now. http://goparallel.sourceforge.net/
_______________________________________________
Gramps-users mailing list
[hidden email]
https://lists.sourceforge.net/lists/listinfo/gramps-users
Reply | Threaded
Open this post in threaded view
|

Re: image or PDF

Douglas Bainbridge
In reply to this post by Gazza
Note:
Also when you open a multi-page PDF in Gimp you have the
option to import it as separate images (as well as layers)

Doug


On 30/03/15 02:59, Garry Seeley wrote:

> Yes I do not see the need to OCR and produce a true PDF document when I
> am just attaching an image to a citation. I see that I am adding an
> image as part of recording the details of a source/citation as further
> prove that the source exists and I don't see that I would need an
> editable document in that case. But, the discussion on OCR and producing
> true PDF documents has been interesting and has helped me understand
> more about PDF.
>
> I have been using ImageMagick to extract images from PDF files and also
> to concatenate those images into a single JPG file. Here are my notes:
>
> Use ImageMagick to save PDF pages as separate images
>
> Open the PDF file from ImageMagick
> Save as format JPG ie 1.jpg and set quality to 100 when asked
> File - Next to view next page
> Save as format JPG ie 2.jpg and set quality to 100 when asked
> etc for rest of pages
>
>
>   From the terminal concatenate images
>
> $ montage -mode concatenate 1.jpg 2.jpg .....  result.jpg
>
> use the option -tile <columns>x<rows> to control the layout to be
> applied. either side may be missing and montage will figure out how to
> meet the constraints.
> ie $ montage -mode concatenate -tile 1x ..........
> will concatenate the images in a single column
> ie $ montage -mode concatenate -tile x1 ..........
> will concatenate the images in a single row
>
> Cheers.
>
>
>
> On 29/03/15 23:34, Kenneth Browne wrote:
>> On 03/29/2015 04:20 AM, Garry Seeley wrote:
>>> I am tending to think the same as you Sebastian and I will go for jpg
>>> (maybe after scanning and editing photos in some other format first).
>>> But I do have some documents that are multi page that I would like to
>>> attach complete. My options in that case would be to attach several jpg
>>> files and name them something like ...page1/2.jpg, ...page2/2.jpg.
>> I've done this in the past when using TMG v.8. In fact if I remember
>> correctly TMG choked
>> on a multipage PDF or simply wouldn't display any PDF files. OCR would
>> have been useless
>> anyway since many of the pages were handwritten affidavits concerning
>> my ggg gfather's
>> U.S. Revolutionary War pension. Since my "original" document was a
>> downloaded PDF I had
>> to center each page on my laptop screen and make a screenshot that I
>> saved as JPG.
>>
>> Ken
>>
>
> ------------------------------------------------------------------------------
> Dive into the World of Parallel Programming The Go Parallel Website, sponsored
> by Intel and developed in partnership with Slashdot Media, is your hub for all
> things parallel software development, from weekly thought leadership blogs to
> news, videos, case studies, tutorials and more. Take a look and join the
> conversation now. http://goparallel.sourceforge.net/
> _______________________________________________
> Gramps-users mailing list
> [hidden email]
> https://lists.sourceforge.net/lists/listinfo/gramps-users
>


------------------------------------------------------------------------------
Dive into the World of Parallel Programming The Go Parallel Website, sponsored
by Intel and developed in partnership with Slashdot Media, is your hub for all
things parallel software development, from weekly thought leadership blogs to
news, videos, case studies, tutorials and more. Take a look and join the
conversation now. http://goparallel.sourceforge.net/
_______________________________________________
Gramps-users mailing list
[hidden email]
https://lists.sourceforge.net/lists/listinfo/gramps-users