Media Verify Tool finds duplicate

classic Classic list List threaded Threaded
7 messages Options
Reply | Threaded
Open this post in threaded view
|

Media Verify Tool finds duplicate

Oldest1

For one of my people I have saved a page from the internet. It is an HTML file with a sidecar sub-directory.

The sidecar directory contains a pile of files, including CSS & other support for the HTML code.

It so happens to contain two (very small) files with different names, but identical content ( as reported by KDiff3)

Because the contents are identical, Media Verify reports the two files as duplicates - which is understandable, but I would not dare to remove either one because it looks like they are used as part of the sidecar files and renaming, of course would not change anything.

My main concern is that with time this problem will grows as I save more such pages.

Is there any way to resolve this - other than to remember these two specific files?



_______________________________________________
Gramps-users mailing list
[hidden email]
https://lists.sourceforge.net/lists/listinfo/gramps-users
https://gramps-project.org
Reply | Threaded
Open this post in threaded view
|

Re: Media Verify Tool finds duplicate

Dave Scheipers
Media verify is scouring the subdirectories of your media folder set
in Preferences. This will show in the duplicate section so long as
these files are in one of these subdirectories. My media files are in:

C:\Users\Public\Genealogy\Media

I would put these HTML files in their own set of folders

C:\Users\Public\Genealogy\HTML-Files

These two files appear for one person. The same two files will
probably appear for more people you may download from the same source
multiplying the number of files Media Verify will flag as duplicates.

Just my initial thoughts, others my offer different solutions.

Dave
On Thu, Nov 15, 2018 at 5:33 PM <[hidden email]> wrote:

>
> For one of my people I have saved a page from the internet. It is an HTML file with a sidecar sub-directory.
>
> The sidecar directory contains a pile of files, including CSS & other support for the HTML code.
>
> It so happens to contain two (very small) files with different names, but identical content ( as reported by KDiff3)
>
> Because the contents are identical, Media Verify reports the two files as duplicates - which is understandable, but I would not dare to remove either one because it looks like they are used as part of the sidecar files and renaming, of course would not change anything.
>
> My main concern is that with time this problem will grows as I save more such pages.
>
> Is there any way to resolve this - other than to remember these two specific files?
>
> _______________________________________________
> Gramps-users mailing list
> [hidden email]
> https://lists.sourceforge.net/lists/listinfo/gramps-users
> https://gramps-project.org


_______________________________________________
Gramps-users mailing list
[hidden email]
https://lists.sourceforge.net/lists/listinfo/gramps-users
https://gramps-project.org
Reply | Threaded
Open this post in threaded view
|

Re: Media Verify Tool finds duplicate

Oldest1
In reply to this post by Oldest1

Looks like you had already run into this issue and found a work-around. :-) Its one option I hadn't considered yet, though as things stand I may have to go this way,

On 2018-11-15 3:08 PM, Dave Scheipers wrote:
Media verify is scouring the subdirectories of your media folder set
in Preferences. This will show in the duplicate section so long as
these files are in one of these subdirectories. My media files are in:

C:\Users\Public\Genealogy\Media

I would put these HTML files in their own set of folders

C:\Users\Public\Genealogy\HTML-Files

These two files appear for one person. The same two files will
probably appear for more people you may download from the same source
multiplying the number of files Media Verify will flag as duplicates.

Just my initial thoughts, others my offer different solutions.

Dave
On Thu, Nov 15, 2018 at 5:33 PM [hidden email] wrote:

-- 
Fight Spam - report it with wxSR 0.8
Vista & Win 7, 8 & 10 compatible
http://www.columbinehoney.net/wxSR.shtml


_______________________________________________
Gramps-users mailing list
[hidden email]
https://lists.sourceforge.net/lists/listinfo/gramps-users
https://gramps-project.org
Reply | Threaded
Open this post in threaded view
|

Re: Media Verify Tool finds duplicate

paul womack
In reply to this post by Oldest1
[hidden email] wrote:

> For one of my people I have saved a page from the internet. It is an HTML file with a sidecar sub-directory.
>
> The sidecar directory contains a pile of files, including CSS & other support for the HTML code.
>
> It so happens to contain two (very small) files with different names, but identical content ( as reported by KDiff3)
>
> Because the contents are identical, Media Verify reports the two files as duplicates - which is understandable, but I would not dare to remove either one because it looks like they are used as part of the sidecar files and renaming, of course would not change anything.
>
> My main concern is that with time this problem will grows as I save more such pages.
>
> Is there any way to resolve this - other than to remember these two specific files?

I would suggest by-passing the problem; if you print to a PDF creation driver (save-as-PDF
if your browser supports it), the problem will simply go away. :-)

  BugBear



_______________________________________________
Gramps-users mailing list
[hidden email]
https://lists.sourceforge.net/lists/listinfo/gramps-users
https://gramps-project.org
Reply | Threaded
Open this post in threaded view
|

Re: Media Verify Tool finds duplicate

enno
In reply to this post by Oldest1
Op 15-11-2018 om 23:31 schreef [hidden email]:

>
> For one of my people I have saved a page from the internet. It is an
> HTML file with a sidecar sub-directory.
>
> The sidecar directory contains a pile of files, including CSS & other
> support for the HTML code.
>
> It so happens to contain two (very small) files with different names,
> but identical content ( as reported by KDiff3)
>
> Because the contents are identical, Media Verify reports the two files
> as duplicates - which is understandable, but I would not dare to
> remove either one because it looks like they are used as part of the
> sidecar files and renaming, of course would not change anything.
>
> My main concern is that with time this problem will grows as I save
> more such pages.
>
> Is there any way to resolve this - other than to remember these two
> specific files?
>
My simple thought would be that you don't use the media verify tool to
add files to the Gramps database, so that it doesn't add these small
stuff to it. I mean, why would you need Gramps to manage those, unless
you want to include them in your backups with media?

If you add the HTML file manually, I assume that when you activate the
media link, your browser will load all those extra's, so I really see no
need to manage those with Gramps, unless for backup.

Another option, next to printing web pages to PDFs, would be to copy the
page to a document, with LibreOffice or whatever software you like. You
can then remove all ads, menus that you don't need, etc., and save the
whole thing in a single file.

Using documents, or PDFs, also has the advantage that if web standards
change, an updated browser may display them in another way than your
current one, or not at all, if they're deemed unsafe. You don't have
that problem when you freeze the page into a document or PDF.

Note that when you copy text from the web to LibreOffice, embedded
pictures may still be retrieved from the web, meaning that they will be
lost when the web site goes off-line. You can counteract that by copying
from the saved page, which will probably have the pictures saved too, so
that their links go to your disc.

Regards,

Enno



_______________________________________________
Gramps-users mailing list
[hidden email]
https://lists.sourceforge.net/lists/listinfo/gramps-users
https://gramps-project.org
Reply | Threaded
Open this post in threaded view
|

Re: Media Verify Tool finds duplicate

Oldest1
In reply to this post by Oldest1

Thank you, Enno,

Saving these sort of files/pages to a separate single 'package', be it PDF or ODT or whatever, with all data I want to keep, including making sure all images are 'local' - and minus all the cruft & ads I don't want, is probably the best solution in the long run and it will solve the issue that prompted this discussion.

On 2018-11-17 11:19 AM, Enno Borgsteede wrote:
Op 15-11-2018 om 23:31 schreef [hidden email]:

For one of my people I have saved a page from the internet. It is an HTML file with a sidecar sub-directory.

The sidecar directory contains a pile of files, including CSS & other support for the HTML code.

It so happens to contain two (very small) files with different names, but identical content ( as reported by KDiff3)

Because the contents are identical, Media Verify reports the two files as duplicates - which is understandable, but I would not dare to remove either one because it looks like they are used as part of the sidecar files and renaming, of course would not change anything.

My main concern is that with time this problem will grows as I save more such pages.

Is there any way to resolve this - other than to remember these two specific files?

My simple thought would be that you don't use the media verify tool to add files to the Gramps database, so that it doesn't add these small stuff to it. I mean, why would you need Gramps to manage those, unless you want to include them in your backups with media?

If you add the HTML file manually, I assume that when you activate the media link, your browser will load all those extra's, so I really see no need to manage those with Gramps, unless for backup.

Another option, next to printing web pages to PDFs, would be to copy the page to a document, with LibreOffice or whatever software you like. You can then remove all ads, menus that you don't need, etc., and save the whole thing in a single file.

Using documents, or PDFs, also has the advantage that if web standards change, an updated browser may display them in another way than your current one, or not at all, if they're deemed unsafe. You don't have that problem when you freeze the page into a document or PDF.

Note that when you copy text from the web to LibreOffice, embedded pictures may still be retrieved from the web, meaning that they will be lost when the web site goes off-line. You can counteract that by copying from the saved page, which will probably have the pictures saved too, so that their links go to your disc.

Regards,

Enno



_______________________________________________
Gramps-users mailing list
[hidden email]
https://lists.sourceforge.net/lists/listinfo/gramps-users
https://gramps-project.org



_______________________________________________
Gramps-users mailing list
[hidden email]
https://lists.sourceforge.net/lists/listinfo/gramps-users
https://gramps-project.org
Reply | Threaded
Open this post in threaded view
|

Re: Media Verify Tool finds duplicate

Ron Johnson

I print web pages to PDF for their nominal feature of "permanentness".

On 11/17/2018 02:39 PM, [hidden email] wrote:

Thank you, Enno,

Saving these sort of files/pages to a separate single 'package', be it PDF or ODT or whatever, with all data I want to keep, including making sure all images are 'local' - and minus all the cruft & ads I don't want, is probably the best solution in the long run and it will solve the issue that prompted this discussion.

On 2018-11-17 11:19 AM, Enno Borgsteede wrote:
Op 15-11-2018 om 23:31 schreef [hidden email]:

For one of my people I have saved a page from the internet. It is an HTML file with a sidecar sub-directory.

The sidecar directory contains a pile of files, including CSS & other support for the HTML code.

It so happens to contain two (very small) files with different names, but identical content ( as reported by KDiff3)

Because the contents are identical, Media Verify reports the two files as duplicates - which is understandable, but I would not dare to remove either one because it looks like they are used as part of the sidecar files and renaming, of course would not change anything.

My main concern is that with time this problem will grows as I save more such pages.

Is there any way to resolve this - other than to remember these two specific files?

My simple thought would be that you don't use the media verify tool to add files to the Gramps database, so that it doesn't add these small stuff to it. I mean, why would you need Gramps to manage those, unless you want to include them in your backups with media?

If you add the HTML file manually, I assume that when you activate the media link, your browser will load all those extra's, so I really see no need to manage those with Gramps, unless for backup.

Another option, next to printing web pages to PDFs, would be to copy the page to a document, with LibreOffice or whatever software you like. You can then remove all ads, menus that you don't need, etc., and save the whole thing in a single file.

Using documents, or PDFs, also has the advantage that if web standards change, an updated browser may display them in another way than your current one, or not at all, if they're deemed unsafe. You don't have that problem when you freeze the page into a document or PDF.

Note that when you copy text from the web to LibreOffice, embedded pictures may still be retrieved from the web, meaning that they will be lost when the web site goes off-line. You can counteract that by copying from the saved page, which will probably have the pictures saved too, so that their links go to your disc.

Regards,

Enno


--
Angular momentum makes the world go 'round.


_______________________________________________
Gramps-users mailing list
[hidden email]
https://lists.sourceforge.net/lists/listinfo/gramps-users
https://gramps-project.org