Wondering if anyone has been experimenting with integrating a data scraper with Gramps?
I was just looking at the BeautifulSoup Python library after seeing a recommendation on another site. Has anyone used it with Gramps?
It looked really intriguing... until seeing the MIT license... which probably makes it unlikely to be considered for full integration. And without that possibility, our community probably won't dive in and figure how to make it sing -- as a data-scraper or a citation generator.
Where and how , exactly, would one use this code? App? Applet? Addon?
On Thu, 2019-11-07 at 17:00 +0000, Emyoulation--- via Gramps-users wrote:
Wondering if anyone has been experimenting with integrating a data scraper with Gramps?
I was just looking at the BeautifulSoup Python library after seeing a recommendation on another site. Has anyone used it with Gramps?
It looked really intriguing... until seeing the MIT license... which probably makes it unlikely to be considered for full integration. And without that possibility, our community probably won't dive in and figure how to make it sing -- as a data-scraper or a citation generator.
In reply to this post by GRAMPS - User mailing list
Hi to all
I have used it to scrape data from the geneanet.org site.
As this scraping is not really accepted by the vast majority of
geneanet and the fact that i m not a good programmer :)
i won't publicly share any code
but i can share privately it you re interesting to have some
example.
Le 07/11/2019 à 18:00, Emyoulation---
via Gramps-users a écrit :
Wondering if anyone has been experimenting with integrating a data
scraper with Gramps?
I was just
looking at the BeautifulSoup Python library after seeing a
recommendation on another site. Has anyone used it with Gramps?
It looked
really intriguing... until seeing the MIT license... which
probably makes it unlikely to be considered for full
integration. And without that possibility, our community
probably won't dive in and figure how to make it sing -- as a
data-scraper or a citation generator.
Subject: Re: [Gramps-users] Any data scrapers for use with Gramps?
Hi to all
I have used it to scrape data from the geneanet.org site.
As this scraping is not really accepted by the vast majority of
geneanet and the fact that i m not a good programmer :)
i won't publicly share any code
but i can share privately it you re interesting to have some
example.
Le 07/11/2019 à 18:00, Emyoulation---
via Gramps-users a écrit :
Wondering if anyone has been experimenting with integrating a data
scraper with Gramps?
I was just
looking at the BeautifulSoup Python library after seeing a
recommendation on another site. Has anyone used it with Gramps?
It looked
really intriguing... until seeing the MIT license... which
probably makes it unlikely to be considered for full
integration. And without that possibility, our community
probably won't dive in and figure how to make it sing -- as a
data-scraper or a citation generator.
The possibilities are endless... only limited by our imagination. Tool, external file generator of importable data, add-on, gramplet.
The prosaic use might be to scrape a (FindAGrave FamilySearch WikiTree Ancestry) page for relatives, dates, events & places and sequentially pre-populate the Data Entry gramplet -- including a reference to the scraped source.
But that's elementary.
A couple years ago at the Big Data Analytics user group in Austin for SxSW, the inventor of MathCAD showed how to train that tool to do visual discrimination of images for data collection in a few (less than ten) lines of code. He used it to search a hard drive and then the net to find any images having characteristics called out. (In our session, we weren't imaginative. We started with a yellow ball then a second search for a dog with a yellow ball. Written & executed in less than 3 minutes.)
The output was shown interactively, piped into a file & to drive a secondary process.
So, say you have a data scraper. You write a script to search a PDF (a county history book) for all unique occurances of the names (& alternative names) of a Person ID and those of all persons within 2 degrees of separation. You format the output to create a Source with a citation for each person having a 'ToDo' note. Each note could contain a generated list of the person's names/alternatives & the page numbers hotlinked to the PDF page & word inside the PDF. Now, to human-check the scraping, you click through the hotlink of the list, the PDF displays the highlighted instance & then you delete/annotate the Note as appropriate.
Another application of a scraper would be to generate proximity ranking for a webpage. If the page mentions multiple members of the family, the same person repeatedly, known dates or places, occupations or any other criteria, it ranks it higher and the page is of more interest.
I said that we weren't imaginative in the MathCAD demo. Our presenter then did several off-the-cuff imaginative analytics applications for image processing. For this he was given access to several thousand images from a concert captured the previous night. It beggered the imagination.
Where and how , exactly, would one use this code? App? Applet? Addon?
On Thu, 2019-11-07 at 17:00 +0000, Emyoulation--- via Gramps-users wrote:
Wondering if anyone has been experimenting with integrating a data scraper with Gramps?
I was just looking at the BeautifulSoup Python library after seeing a recommendation on another site. Has anyone used it with Gramps?
It looked really intriguing... until seeing the MIT license... which probably makes it unlikely to be considered for full integration. And without that possibility, our community probably won't dive in and figure how to make it sing -- as a data-scraper or a citation generator.
Subject: Re: [Gramps-users] Any data scrapers for use with Gramps?
Hi to all
I have used it to scrape data from the geneanet.org site.
As this scraping is not really accepted by the vast majority of
geneanet and the fact that i m not a good programmer :)
i won't publicly share any code
but i can share privately it you re interesting to have some
example.
Le 07/11/2019 à 18:00, Emyoulation---
via Gramps-users a écrit :
Wondering if anyone has been experimenting with integrating a data
scraper with Gramps?
I was just
looking at the BeautifulSoup Python library after seeing a
recommendation on another site. Has anyone used it with Gramps?
It looked
really intriguing... until seeing the MIT license... which
probably makes it unlikely to be considered for full
integration. And without that possibility, our community
probably won't dive in and figure how to make it sing -- as a
data-scraper or a citation generator.
I use Zotero standalone to store all sources I find...
Using their web scraper for collection data, and then add metadata when the webpages dont provide any,
Then I can easily copy a formated citation string directly into any text field or notes in Gramps and most other software I use, It also have add-ons for both MS Office and LibreOffice.
When I have documents for a source I store them mostly "outside" Zotero and creates file links, that way I can easily add the files to any other media tool I want.
There are an python library for accessing Zotero libraries, but I think that is for the Zotero web site, I have never looked into it, but, there is a API for the local standalone also working as long as Zotero is running... in addition it use a sqlite database...
I also are trying out "Polar" and TagSpace, in addition to the two Notebook applications Joplin and Trillium, that both have webclippers. But as for now, Zotero is my main source and bibliography tool...
It should be possible to create an add-on to Zotero/gramplet for Gramps that access any field and documents in Zotero and paste it in to the right fields in Gramps, but I'm not a developer, so I just use the copy/paste feature at the moment...
Its easy to chose the citation format you want to use, and the copy/paste functionality creates the correct string regarding the format you have defind.
Jaran
tor. 7. nov. 2019 kl. 19:43 skrev Bryan S <[hidden email]>:
Hmmm..fascinating, indeed.
Would the source need to be in a .pdf format? That is, can you use it on numerous formats of the sources?
What is the reliability percent and accuracy?
On Thu, 2019-11-07 at 12:10 -0600, digital0xff wrote:
I would love if ancestry.com provided APIs. But that will never happen.
Subject: Re: [Gramps-users] Any data scrapers for use with Gramps?
Hi to all
I have used it to scrape data from the geneanet.org site.
As this scraping is not really accepted by the vast majority of
geneanet and the fact that i m not a good programmer :)
i won't publicly share any code
but i can share privately it you re interesting to have some
example.
Le 07/11/2019 à 18:00, Emyoulation---
via Gramps-users a écrit :
Wondering if anyone has been experimenting with integrating a data
scraper with Gramps?
I was just
looking at the BeautifulSoup Python library after seeing a
recommendation on another site. Has anyone used it with Gramps?
It looked
really intriguing... until seeing the MIT license... which
probably makes it unlikely to be considered for full
integration. And without that possibility, our community
probably won't dive in and figure how to make it sing -- as a
data-scraper or a citation generator.
Zotero was also intriguing. There were recent Facebook postings about books being published about using it for Genealogy. (Book relrases are a good indicator the audience for a product has grown.)
Any thoughts about OpenDocMan? The idea of my source documents living on the Apache based hosting service I've been using since the turn of the century is very appealing. It would make sharing much easier. And I could post a Gramps generated website there too.
have not looked at OpenDocMan,
other than read the features, I figured out that because of the big
differences between the paid and community edition, I wouldn't take time
to test it.
If you tell me a little about what you are
looking for, it might be that I have some answer regarding Document
management also... both server and local installations...
I
think you shall have a really big collection of documents befor you
outgrow i.e. Zotero... I have used it for 2-3 years now and I have
multiple databases, one for software research, holding approx 1400
solutions I have found interesting, and my genealogy research, that
holds approx 2200 documents and sources, both files and websites...
fre. 8. nov. 2019 kl. 01:24 skrev StoltHD <[hidden email]>:
I have not looked at OpenDocMan, other than read the features, I figured out that because of the big differences between the paid and community edition, I wouldn't take time to test it.
If you tell me a little about what you are looking for, it might be that I have some answer regarding Document management also... both server and local installations...
I think you shall have a really big collection of documents befor you outgrow i.e. Zotero... I have used it for 2-3 years now and I have multiple databases, one for software research, holding approx 1400 solutions I have found interesting, and my genealogy research, that holds approx 2200 documents and sources, both files and websites...
tor. 7. nov. 2019 kl. 22:39 skrev Emyoulation--- via Gramps-users <[hidden email]>:
Zotero was also intriguing. There were recent Facebook postings about books being published about using it for Genealogy. (Book relrases are a good indicator the audience for a product has grown.)
Any thoughts about OpenDocMan? The idea of my source documents living on the Apache based hosting service I've been using since the turn of the century is very appealing. It would make sharing much easier. And I could post a Gramps generated website there too.