Any data scrapers for use with Gramps?

classic Classic list List threaded Threaded
9 messages Options
Reply | Threaded
Open this post in threaded view
|

Any data scrapers for use with Gramps?

GRAMPS - User mailing list
Wondering if anyone has been experimenting with integrating a data scraper with Gramps?

I was just looking at the BeautifulSoup Python library after seeing a recommendation on another site. Has anyone used it with Gramps?

It looked really intriguing... until seeing the MIT license... which probably makes it unlikely to be considered for full integration. And without that possibility, our community probably won't dive in and figure how to make it sing -- as a data-scraper or a citation generator.

Is there another option someone can suggest?

-Brian


--
Gramps-users mailing list
[hidden email]
https://lists.sourceforge.net/lists/listinfo/gramps-users
https://gramps-project.org
Reply | Threaded
Open this post in threaded view
|

Re: Any data scrapers for use with Gramps?

Bryan S
Where and how , exactly, would one use this code? App? Applet? Addon?



On Thu, 2019-11-07 at 17:00 +0000, Emyoulation--- via Gramps-users wrote:
Wondering if anyone has been experimenting with integrating a data scraper with Gramps?

I was just looking at the BeautifulSoup Python library after seeing a recommendation on another site. Has anyone used it with Gramps?

It looked really intriguing... until seeing the MIT license... which probably makes it unlikely to be considered for full integration. And without that possibility, our community probably won't dive in and figure how to make it sing -- as a data-scraper or a citation generator.

Is there another option someone can suggest?

-Brian


--
Gramps-users mailing list
[hidden email]
https://lists.sourceforge.net/lists/listinfo/gramps-users
https://gramps-project.org
Reply | Threaded
Open this post in threaded view
|

Re: Any data scrapers for use with Gramps?

Eric Doutreleau
In reply to this post by GRAMPS - User mailing list

Hi to all

I have used it to scrape data from the geneanet.org site.

As this scraping is not really accepted by the vast majority of geneanet and the fact that i m not a good programmer :)

i won't publicly share any code

but i can share privately it you re interesting to have some example.

Le 07/11/2019 à 18:00, Emyoulation--- via Gramps-users a écrit :
Wondering if anyone has been experimenting with integrating a data scraper with Gramps?

I was just looking at the BeautifulSoup Python library after seeing a recommendation on another site. Has anyone used it with Gramps?

It looked really intriguing... until seeing the MIT license... which probably makes it unlikely to be considered for full integration. And without that possibility, our community probably won't dive in and figure how to make it sing -- as a data-scraper or a citation generator.

Is there another option someone can suggest?

-Brian




--
Gramps-users mailing list
[hidden email]
https://lists.sourceforge.net/lists/listinfo/gramps-users
https://gramps-project.org
Reply | Threaded
Open this post in threaded view
|

Re: Any data scrapers for use with Gramps?

Jeff D
I would love if ancestry.com provided APIs. But that will never happen. 



Sent from my Verizon, Samsung Galaxy smartphone

-------- Original message --------
From: Eric Doutreleau <[hidden email]>
Date: 11/7/19 11:23 AM (GMT-06:00)
Subject: Re: [Gramps-users] Any data scrapers for use with Gramps?

Hi to all

I have used it to scrape data from the geneanet.org site.

As this scraping is not really accepted by the vast majority of geneanet and the fact that i m not a good programmer :)

i won't publicly share any code

but i can share privately it you re interesting to have some example.

Le 07/11/2019 à 18:00, Emyoulation--- via Gramps-users a écrit :
Wondering if anyone has been experimenting with integrating a data scraper with Gramps?

I was just looking at the BeautifulSoup Python library after seeing a recommendation on another site. Has anyone used it with Gramps?

It looked really intriguing... until seeing the MIT license... which probably makes it unlikely to be considered for full integration. And without that possibility, our community probably won't dive in and figure how to make it sing -- as a data-scraper or a citation generator.

Is there another option someone can suggest?

-Brian




--
Gramps-users mailing list
[hidden email]
https://lists.sourceforge.net/lists/listinfo/gramps-users
https://gramps-project.org
Reply | Threaded
Open this post in threaded view
|

Re: Any data scrapers for use with Gramps?

GRAMPS - User mailing list
In reply to this post by Bryan S
The possibilities are endless... only limited by our imagination. Tool, external file generator of importable data, add-on, gramplet. 

The prosaic use might be to scrape a (FindAGrave FamilySearch WikiTree Ancestry) page for relatives, dates, events & places and sequentially pre-populate the Data Entry gramplet -- including a reference to the scraped source.

But that's elementary.

A couple years ago at the Big Data Analytics user group in Austin for SxSW, the inventor of MathCAD showed how to train that tool to do visual discrimination of images for data collection in a few (less than ten) lines of code. He used it to search a hard drive and then the net to find any images having characteristics called out. (In our session, we weren't imaginative. We started with a yellow ball then a second search for a dog with a yellow ball. Written & executed in less than 3 minutes.)

The output was shown interactively, piped into a file & to drive a secondary process. 

So, say you have a data scraper. You write a script to search a PDF (a county history book) for all unique occurances of the names (& alternative names) of a Person ID and those of all persons within 2 degrees of separation. You format the output to create a Source with a citation for each person having a 'ToDo' note. Each note could contain a generated list of the person's names/alternatives & the page numbers hotlinked to the PDF page & word inside the PDF. Now, to human-check the scraping, you click through the hotlink of the list, the PDF displays the highlighted instance & then you delete/annotate the Note as appropriate.

Another application of a scraper would be to generate proximity ranking for a webpage. If the page mentions multiple members of the family, the same person repeatedly, known dates or places, occupations or any other criteria, it ranks it higher and the page is of more interest.

I said that we weren't imaginative in the MathCAD demo. Our presenter then did several off-the-cuff imaginative analytics applications for image processing. For this he was given access to several thousand images from a concert captured the previous night. It beggered the imagination. 

-Brian

On Thu, Nov 7, 2019 at 11:22, Bryan S
Where and how , exactly, would one use this code? App? Applet? Addon?



On Thu, 2019-11-07 at 17:00 +0000, Emyoulation--- via Gramps-users wrote:
Wondering if anyone has been experimenting with integrating a data scraper with Gramps?

I was just looking at the BeautifulSoup Python library after seeing a recommendation on another site. Has anyone used it with Gramps?

It looked really intriguing... until seeing the MIT license... which probably makes it unlikely to be considered for full integration. And without that possibility, our community probably won't dive in and figure how to make it sing -- as a data-scraper or a citation generator.

Is there another option someone can suggest?

-Brian


--
Gramps-users mailing list
[hidden email]
https://lists.sourceforge.net/lists/listinfo/gramps-users
https://gramps-project.org
Reply | Threaded
Open this post in threaded view
|

Re: Any data scrapers for use with Gramps?

Bryan S
In reply to this post by Jeff D
Hmmm..fascinating, indeed.  

Would the source need to be in a .pdf format? That is, can you use it on numerous formats of the sources?

What is the reliability percent and accuracy?




On Thu, 2019-11-07 at 12:10 -0600, digital0xff wrote:
I would love if ancestry.com provided APIs. But that will never happen. 



Sent from my Verizon, Samsung Galaxy smartphone

-------- Original message --------
From: Eric Doutreleau <[hidden email]>
Date: 11/7/19 11:23 AM (GMT-06:00)
Subject: Re: [Gramps-users] Any data scrapers for use with Gramps?

Hi to all

I have used it to scrape data from the geneanet.org site.

As this scraping is not really accepted by the vast majority of geneanet and the fact that i m not a good programmer :)

i won't publicly share any code

but i can share privately it you re interesting to have some example.

Le 07/11/2019 à 18:00, Emyoulation--- via Gramps-users a écrit :
Wondering if anyone has been experimenting with integrating a data scraper with Gramps?

I was just looking at the BeautifulSoup Python library after seeing a recommendation on another site. Has anyone used it with Gramps?

It looked really intriguing... until seeing the MIT license... which probably makes it unlikely to be considered for full integration. And without that possibility, our community probably won't dive in and figure how to make it sing -- as a data-scraper or a citation generator.

Is there another option someone can suggest?

-Brian




--
Gramps-users mailing list
[hidden email]
https://lists.sourceforge.net/lists/listinfo/gramps-users
https://gramps-project.org
Reply | Threaded
Open this post in threaded view
|

Re: Any data scrapers for use with Gramps?

StoltHD
I use Zotero standalone to store all sources I find...

Using their web scraper for collection data, and then add metadata when the webpages dont provide any,

Then I can easily copy a formated citation string directly into any text field or notes in Gramps and most other software I use, It also have add-ons for both MS Office and LibreOffice.

When I have documents for a source I store them mostly "outside"  Zotero and creates file links, that way I can easily add the files to any other media tool I want.

There are an python library for accessing Zotero libraries, but I think that is for the Zotero web site, I have never looked into it, but, there is a API for the local standalone also working as long as Zotero is running... in addition it use a sqlite database...

I also are trying out "Polar" and TagSpace, in addition to the two Notebook applications Joplin and Trillium, that both have webclippers. But as for now, Zotero is my main source and bibliography tool...

It should be possible to create an add-on to Zotero/gramplet for Gramps that access any field and documents in Zotero and paste it in to the right fields in Gramps, but I'm not a developer, so I just use the copy/paste feature at the moment...

Its easy to chose the citation format you want to use, and the copy/paste functionality creates the correct string regarding the format you have defind.

Jaran

tor. 7. nov. 2019 kl. 19:43 skrev Bryan S <[hidden email]>:
Hmmm..fascinating, indeed.  

Would the source need to be in a .pdf format? That is, can you use it on numerous formats of the sources?

What is the reliability percent and accuracy?




On Thu, 2019-11-07 at 12:10 -0600, digital0xff wrote:
I would love if ancestry.com provided APIs. But that will never happen. 



Sent from my Verizon, Samsung Galaxy smartphone

-------- Original message --------
From: Eric Doutreleau <[hidden email]>
Date: 11/7/19 11:23 AM (GMT-06:00)
Subject: Re: [Gramps-users] Any data scrapers for use with Gramps?

Hi to all

I have used it to scrape data from the geneanet.org site.

As this scraping is not really accepted by the vast majority of geneanet and the fact that i m not a good programmer :)

i won't publicly share any code

but i can share privately it you re interesting to have some example.

Le 07/11/2019 à 18:00, Emyoulation--- via Gramps-users a écrit :
Wondering if anyone has been experimenting with integrating a data scraper with Gramps?

I was just looking at the BeautifulSoup Python library after seeing a recommendation on another site. Has anyone used it with Gramps?

It looked really intriguing... until seeing the MIT license... which probably makes it unlikely to be considered for full integration. And without that possibility, our community probably won't dive in and figure how to make it sing -- as a data-scraper or a citation generator.

Is there another option someone can suggest?

-Brian


--
Gramps-users mailing list
[hidden email]
https://lists.sourceforge.net/lists/listinfo/gramps-users
https://gramps-project.org


--
Gramps-users mailing list
[hidden email]
https://lists.sourceforge.net/lists/listinfo/gramps-users
https://gramps-project.org
Reply | Threaded
Open this post in threaded view
|

Re: Any data scrapers for use with Gramps?

GRAMPS - User mailing list
Zotero was also intriguing. There were recent Facebook postings about books being published about using it for Genealogy. (Book relrases are a good indicator the audience for a product has grown.)

The book sites:

Any thoughts about OpenDocMan?  The idea of my source documents living on the Apache based hosting service I've been using since the turn of the century is very appealing. It would make sharing much easier. And I could post a Gramps generated website there too.

And there are more options!

-Brian

On Thu, Nov 7, 2019 at 14:35, StoltHD
<[hidden email]> wrote about
Regarding using Zotero with Gramps


--
Gramps-users mailing list
[hidden email]
https://lists.sourceforge.net/lists/listinfo/gramps-users
https://gramps-project.org
Reply | Threaded
Open this post in threaded view
|

Re: Any data scrapers for use with Gramps?

StoltHD
have not looked at OpenDocMan, other than read the features, I figured out that because of the big differences between the paid and community edition, I wouldn't take time to test it.

If you tell me a little about what you are looking for, it might be that I have some answer regarding Document management also... both server and local installations...
i.e. openKM https://www.openkm.com/

It all depend on what you need...
I think you shall have a really big collection of documents befor you outgrow i.e. Zotero... I have used it for 2-3 years now and I have multiple databases, one for software research, holding approx 1400 solutions I have found interesting, and my genealogy research, that holds approx 2200 documents and sources, both files and websites...

You might also be interested in https://github.com/Langenscheiss/bibitnow


But, as a local document management solution I find this one interesting (i have not tested it yet) http://jhierrot.github.io/openprodoc/index.html

and you have Openpaper https://openpaper.work/en-us/ (not tested this either)

Here are some other server solutions you could look into:
https://github.com/inveniosoftware/invenio and https://github.com/zenodo/zenodo
https://www.logicaldoc.com/download-logicaldoc-community
https://docs.mayan-edms.com/index.html
https://github.com/aegif/NemakiWare
https://github.com/the-paperless-project/paperless


But this solutions don't have web clippers or web scrapers, this is document management solutions only...

I have even more, some for investigative journalism, some knowledge graph solutions

It might be that Omeka can be of interest https://omeka.org/


Jaran

fre. 8. nov. 2019 kl. 01:24 skrev StoltHD <[hidden email]>:
I have not looked at OpenDocMan, other than read the features, I figured out that because of the big differences between the paid and community edition, I wouldn't take time to test it.

If you tell me a little about what you are looking for, it might be that I have some answer regarding Document management also... both server and local installations...
i.e. openKM https://www.openkm.com/

It all depend on what you need...
I think you shall have a really big collection of documents befor you outgrow i.e. Zotero... I have used it for 2-3 years now and I have multiple databases, one for software research, holding approx 1400 solutions I have found interesting, and my genealogy research, that holds approx 2200 documents and sources, both files and websites...

But, as a local document management solution I find this one interesting (i have not tested it yet) http://jhierrot.github.io/openprodoc/index.html

and you have Openpaper https://openpaper.work/en-us/ (not tested this either)

Here are some other server solutions you could look into:
https://github.com/inveniosoftware/invenio and https://github.com/zenodo/zenodo
https://www.logicaldoc.com/download-logicaldoc-community
https://docs.mayan-edms.com/index.html
https://github.com/aegif/NemakiWare
https://github.com/the-paperless-project/paperless


But this solutions don't have web clippers or web scrapers, this is document management solutions only...

I have even more, some for investigative journalism, some knowledge graph solutions

It might be that Omeka can be of interest https://omeka.org/


Jaran

tor. 7. nov. 2019 kl. 22:39 skrev Emyoulation--- via Gramps-users <[hidden email]>:
Zotero was also intriguing. There were recent Facebook postings about books being published about using it for Genealogy. (Book relrases are a good indicator the audience for a product has grown.)

The book sites:

Any thoughts about OpenDocMan?  The idea of my source documents living on the Apache based hosting service I've been using since the turn of the century is very appealing. It would make sharing much easier. And I could post a Gramps generated website there too.

And there are more options!

-Brian

On Thu, Nov 7, 2019 at 14:35, StoltHD
<[hidden email]> wrote about
Regarding using Zotero with Gramps
--
Gramps-users mailing list
[hidden email]
https://lists.sourceforge.net/lists/listinfo/gramps-users
https://gramps-project.org


--
Gramps-users mailing list
[hidden email]
https://lists.sourceforge.net/lists/listinfo/gramps-users
https://gramps-project.org