Jump to content

Building a database for all coins of a specific type: IT help requested


Roerbakmix

Recommended Posts

As my area of interest is fairly focussed (I collect sceattas only, small silver coins minted between 685-750 AD), I have been playing with the thought of building a dataset of all sceattas that have been sold, found, or are in public collections. I estimate the total number of coins in this dataset at ~ 7500-10.000 (there are about 4500 sceattas on ACsearch alone). 

To keep it manageable, I would like to extract the following:

  • a photo of the coin 
  • weight
  • diameter
  • find location (if known)
  • description of the auction house / venue. This will allow me to add more fields later (such as auction, lot, provenance, etc). 

I will manually fill in the following:

  • the ID of the coin (e.g. Series X type 31)

This is of course a lot of work. I plan to use this database for (a) finding provenance, (b) research, e.g. die studies, mapping find locations, etc. As the databases uses photo's and text, some with copyright, I intend to use this database for private use. 

I am looking for the right program to build such a database. I have tried MS Access, but it's not the ideal program for photo's. Suggestions are appreciated!

 

  • Like 5
Link to comment
Share on other sites

I use FileMaker Pro, which is now under Claris. https://www.claris.com

It is not free, but works well for my needs (probably similar to yours). I have been using it for years. I was thinking about changing to SQL (which can be free and has advantages), but I could not justify the time moving the platform. I am a Mac user, so no MS Access.

You will probably require several linked tables rather than one master table if you expect sorting dies, die combinations, multiple entries per coin from repeat sales, lists of hoards, etc. This will depend on how ambitious your project is. Good luck - this is an interesting topic. 

 

PS: I am not an IT, so better options likely exist.

Edited by Rand
  • Like 2
Link to comment
Share on other sites

  • Benefactor

From a software engineering perspective, the choice of database is one of the last you should make.

How you build your application will depend on your knowledge of coding and your intended use. For example, I'm a software engineer by trade, but I only need my own application for myself. Therefore, my "database" consists of a JSON file with metadata, and the images are simply links to files on my machine. Anything "relational" I just code myself. This is crude, but works.

Therefore, the biggest question you'll need to answer first is: who will use your application?

  • You
  • You and a few friends

If you're not familiar with coding, there may be some online options you can build on. I agree that MS Access should not be a choice. However, if you're familiar with coding you should put your database behind interfaces anyways so you can swap out as necessary.

If anyone other than you will use it, you'll probably want some online solution. If it's truly just you, then you can follow my example and build something that only runs on your machine. Note that Windows Forms, though an old technology, are fairly easy to code and can bind to a dataset. This makes designing a simple UI very quick.

  • Like 4
Link to comment
Share on other sites

I have several DB running under Access, one with 27000 records. It's true that coding with Access was at first a nightmare (I started with Access 7, a long time ago!), but now it's much better... There's no problem with the photos once we understand that we must not integrate them into DB, but manage them with links.

  • Like 1
Link to comment
Share on other sites

1 hour ago, SimonW said:

@Roerbakmix, do you know the nomisma.org project? The software they use (Numishare) is open source: https://github.com/ewg118/numishare.

I was able to run Numishare on my laptop but found it extremely difficult to install.  Only a few museums have installed it.  I don't know of any individuals using it.

Although the software has the ability to enter coins, my understanding is that the ANS uses it only for searching and rendering coin databases.  (For data entry they use FileMaker.)

If you find anything let me know.

  • Like 1
Link to comment
Share on other sites

7 minutes ago, Ed Snible said:

I was able to run Numishare on my laptop but found it extremely difficult to install.  Only a few museums have installed it.  I don't know of any individuals using it.

Although the software has the ability to enter coins, my understanding is that the ANS uses it only for searching and rendering coin databases.  (For data entry they use FileMaker.)

If you find anything let me know.

Did you use Docker to install it? With Docker you should be able to get everything up and running in minutes. I haven't tried it yet, but will definitely give it a try for a project of my own when I find some time.

Link to comment
Share on other sites

17 minutes ago, SimonW said:

Did you use Docker to install it? With Docker you should be able to get everything up and running in minutes. I haven't tried it yet, but will definitely give it a try for a project of my own when I find some time.

This was two years ago.  Things may have changed.  At that time, the provided _Dockerfile_ didn't work.  https://github.com/ewg118/numishare/issues/119 .  Let us know if things are working now.

I used Vagrant to install it.  It did not take minutes.  Instructions here: https://github.com/esnible/numishare-vagrant .  Running on a Mac under Vagrant I found it tedious to enter coins.  At that time I was looking for something that made it easy to enter coins by hand, so I did not pursue the work required to make it easy to install on a Mac.

  • Like 1
Link to comment
Share on other sites

I would think the biggest problem is extracting and processing the data in the first place. Getting the data into your format and saving down the images is where the work is. How will you get the ACSearch info into the right format without it being a huge manual job?

Any database will work fine as a 'back end'. It's not really a database at all, given there are so few data fields. Even Excel would do it, and could in any case be 'imported' into something else. In fact, I would go for as many fields as you can to start with, as it's a lot easier to join them back together than split them out. But each coin has one entry in each field, so it isn't a database as such.

You wouldn't store the photos in the database. You put them in a folder and put the address of the folder/file in the database. That can be automated except for the specific file name, although you might be able to standardise that to make it a lot easier e.g. coinname-auctionhouse-dateofauction.jpg. If you store all the photos in the 'cloud' those can be accessed by anyone (or any database, or any reporting system) given permission for the folder. You can then build a 'report' (or web form) that brings up the search results. It also gives you a chance to expand what you're doing later if you think of other uses for the data.

  • Like 3
Link to comment
Share on other sites

Some of the people on this thread use "database" to mean "database engine" (e.g. Microsoft SQL Server, PostgreSQL, MySQL, or MongoDB.  Other people use "database" to mean an a graphical user interface such as FileMaker or AirTable that integrated with a database.

@John Conduitt asks how to get the ACSearch info without it being a manual job.  I want to point out that the ACSearch terms at https://www.acsearch.info/subscribe.html forbid web scrapers.  This is above and beyond copyright, and you agreed to it if you created an account.  (Not to pick on ACSEarch; other systems mostly have similar rules.  If you have a programmer who can scrape data you might be able to get permission if you ask nicely.  I prefer to do the work manually, rather creating scrapers and asking for permission to use them.

The last personal project I worked had about 500 coins.  I used Excel.  I had some small scripts to produce interesting visual reports and coin images with text details.  If you are on your own it will be hard to do 10,000 coins.  In the Introduction to Lingren III the cataloger, Dr. Lingren, says it took 2000 hours to produce a catalog of 1500 coins.  More than one hour per coin to check various references of Roman provincial coinage.  Can modern technology improve on that rate?  2000 hours is 50 weeks times 40 hours a week -- full time work for a year for a retired doctor.  I suspect many sceattas can be IDed quickly but some will take longer.

Another big source of sceatta data is the UK Portable Antiquities Schema.  https://finds.org.uk/database/search/results/q/sceatta shows 2,299 sceattas.  I know they have latitude and longitude for many of them, although the web site may restrict that information.  A few years ago I went to a conference in New Jersey hosted by The FLAME Project.  https://coinage.princeton.edu/ .  The FLAME project has a map visualizer of findspot coins from "Late Antiquity".  I don't know if it is open to the public.  They integrated data from the Portable Antiquities Scheme.  It looked amazing -- they could light up England, based on where people where losing silver coins during any date range up to 725 AD.  I realize that cuts off most of the sceattas, but the point is that the academic community is gathering around the standards of nomisma.org to share data.

  • Like 5
Link to comment
Share on other sites

That is what I have done with Seleucids. There around 2000 entries, I stopped when my 2nd child was born but I will pick it up again. How do you plan to access the data? SQL commands are very useful and not very complex. I assume you want to search queries like all coins minted in *..... In that case a relational database is the way to go. John is right, importing the data will be a large effort.

  • Like 3
Link to comment
Share on other sites

  • Benefactor

Everything you see at rnumis.com I've built on top of web hosted mysql databases that I've created. I've written web interfaces so that I (and you, for those areas publicly permissioned) can access that information. I can also query the data tables directly with mysql commands. A lot of different related pieces so you can query and see something like this

https://www.rnumis.com/greek_coins_detail.php?ref=KLN_20211029&refid=27

Everything is web hosted for access anywhere. I don't need to worry about my home computer blowing up. That's the easy part.

For sure, 99% of my rnumis time is spent extracting the information and images those tables will use, even with software I've written to speed things up.  

Steve

  • Like 4
Link to comment
Share on other sites

On 1/3/2024 at 8:45 PM, Ed Snible said:

This was two years ago.  Things may have changed.  At that time, the provided _Dockerfile_ didn't work.  https://github.com/ewg118/numishare/issues/119 .  Let us know if things are working now.

I used Vagrant to install it.  It did not take minutes.  Instructions here: https://github.com/esnible/numishare-vagrant .  Running on a Mac under Vagrant I found it tedious to enter coins.  At that time I was looking for something that made it easy to enter coins by hand, so I did not pursue the work required to make it easy to install on a Mac.

Thank you @Ed Snible, that's very helpful. I will let you know if it works once I get a chance to give it a try. I find the following passage in their "readme" funny:

The architecture is a major departure from typical LAMP (Linux, Apache, MySQL, PHP) content management systems, enabling far more sophisticated data models which allow for the creation of an advanced public user interface.

They should have stayed with the "typical LAMP content management systems". I don't see how Orbeon and an XML database are in any way superior to MySQL and PHPMyAdmin, for example.

 

6 hours ago, Ed Snible said:

@John Conduitt asks how to get the ACSearch info without it being a manual job.  I want to point out that the ACSearch terms at https://www.acsearch.info/subscribe.html forbid web scrapers.  This is above and beyond copyright, and you agreed to it if you created an account.  (Not to pick on ACSEarch; other systems mostly have similar rules.  If you have a programmer who can scrape data you might be able to get permission if you ask nicely.  I prefer to do the work manually, rather creating scrapers and asking for permission to use them.

It's always those who scrape the web themselves who forbid scrapers, isn't it? 🙂 

No, but seriously, it literally takes years of work and effort to compile the amount of data that acsearch has, and of course this data will be protected. 

If you are looking for information about just a few hundred or thousand coins and don't need price details, feel free to contact us and we will try to help. After all, acsearch was created to help collectors and numismatists, which is also why we keep access to all data after 1999 (except price information) free of charge, while others limit it to the last 6 months.

Edited by SimonW
  • Thanks 1
Link to comment
Share on other sites

4 hours ago, SimonW said:

I don't see how Orbeon and an XML database are in any way superior to MySQL and PHPMyAdmin, for example.

To understand their approach read this paper by Ethan Gruber and Andrew Meadows: https://dlib.nyu.edu/awdl/isaw/isaw-papers/20-6/

The people who organize conferences and write grants like Linked Open Data.  They are less interested in JSON and API keys, which is what I see in the tech industry.

After reading early papers on Numishare and Nomisma I was fired up to see what I could do with their APIs.  I quickly learned that the museum community makes those API available to each other, but not publicly.  I cannot execute a SPARQL query linking WikiData with ANS coin data.  The ANS SPARQL endpoint is private.

  • Like 1
Link to comment
Share on other sites

1 hour ago, Ed Snible said:

To understand their approach read this paper by Ethan Gruber and Andrew Meadows: https://dlib.nyu.edu/awdl/isaw/isaw-papers/20-6/

The people who organize conferences and write grants like Linked Open Data.  They are less interested in JSON and API keys, which is what I see in the tech industry.

After reading early papers on Numishare and Nomisma I was fired up to see what I could do with their APIs.  I quickly learned that the museum community makes those API available to each other, but not publicly.  I cannot execute a SPARQL query linking WikiData with ANS coin data.  The ANS SPARQL endpoint is private.

Thanks for the link to this paper! Makes more sense now. It would be nice to have a website/platform where collectors can upload/share their coins within the Nomisma ecosystem (redacted of course, similar to RPC). Individuals launching their own Numishare applications as a contribution to the linked open data doesn't make much sense. And it's probably too clumsy to be used as a simple collection/gallery tool.

What kind of SPARQL queries did you try that didn't work? I've run a few here https://nomisma.org/sparql/ that seemed to work fine. I've also successfully tried some of their API endpoints here https://nomisma.org/documentation/apis/.

Edited by SimonW
Link to comment
Share on other sites

 

@Roerbakmix everyone has mentioned the main things already but I'd recommend following up with Simon to see if there's an easy way to get those listings from acsearch as they will almost have everything you need. You can use a little regex to try and extract the mint, weight, and diameter from the descriptions, it won't be perfect but you can probably get something that works in 90% of cases and you can manually check the remaining ones. The auction house and details will be in their own fields so no regex would be needed to extract those.

7 hours ago, SimonW said:

It's always those who scrape the web themselves who forbid scrapers, isn't it? 🙂 

No, but seriously, it literally takes years of work and effort to compile the amount of data that acsearch has, and of course this data will be protected. 

If you are looking for information about just a few hundred or thousand coins and don't need price details, feel free to contact us and we will try to help. After all, acsearch was created to help collectors and numismatists, which is also why we keep access to all data after 1999 (except price information) free of charge, while others limit it to the last 6 months.

Thanks for mentioning this Simon, I might contact you about that sometime!

  • Like 1
Link to comment
Share on other sites

@Roerbakmix

My perspective of a non-IT person.

Extracting ACSearch data is the easiest part. ACSearch is exceptionally well-designed and has a list of the covered auctions. Each year, it has about 300-400 coins of interest, which takes about two hours to copy and paste into my database. It only costs Euro 15 to get a week for price access. Although prices are of little interest to me, I tend to get access twice a year and see this as a token of appreciation to the ACSearch. 

Creating a suitable database has been far more time-consuming. MS Excel and MS Access quickly became out of the scope of my needs. FileMaker is easy to use, fast, and allows to keep images in the database! Storing all images in a separate folder is the correct way for a proper database. However, I tend to have many images per coin. There may be many photos of the same coin. I want to keep both the original and cropped images to help visualisation; additional images may include different parts of old catalogues or books. I copy and paste images, and the software does resizing, etc. 

The ability to share the data is an important consideration. I DO NOT plan this any time soon. The reason is the ownership of the images. I store them for personal use only and must consult the owner's policy to publish or share them. This may be a significant undertaking. For example, I recently commissioned photos of coins from the British Museum (which is an excellent service). If I wanted to include a photo in a scholarly publication, I would still need to pay (total combined print run and download units, prices per image ex-VAT): Up to 500 — £25; 501–1,000 — £35; 1,001-2,000 — £45, and so on. The last thing I want is to breach any ownership rights.

The most consuming part of the analysis is matching the dies and die links, calculating their rates and projected die numbers, and building die charts. A good database would greatly simplify this.

I am sure many on the forum from the STEM sector would easily create superb solutions, but my coding skills are basic and off-the-shelf database software, like FileMaker, is the easiest solution.

Edited by Rand
  • Like 1
Link to comment
Share on other sites

  • Benefactor
On 1/3/2024 at 8:28 AM, Roerbakmix said:

As my area of interest is fairly focussed (I collect sceattas only, small silver coins minted between 685-750 AD), I have been playing with the thought of building a dataset of all sceattas that have been sold, found, or are in public collections. I estimate the total number of coins in this dataset at ~ 7500-10.000 (there are about 4500 sceattas on ACsearch alone). 

To keep it manageable, I would like to extract the following:

  • a photo of the coin 
  • weight
  • diameter
  • find location (if known)
  • description of the auction house / venue. This will allow me to add more fields later (such as auction, lot, provenance, etc). 

I will manually fill in the following:

  • the ID of the coin (e.g. Series X type 31)

This is of course a lot of work. I plan to use this database for (a) finding provenance, (b) research, e.g. die studies, mapping find locations, etc. As the databases uses photo's and text, some with copyright, I intend to use this database for private use. 

I am looking for the right program to build such a database. I have tried MS Access, but it's not the ideal program for photo's. Suggestions are appreciated!

 

Thinking about the original question a bit more, I see several practical challenges.

  • If you were solely interested in the photo, weight, diameter, and auction house, then there is little need to build such a system for your own use. You can just use ACSearch. Alternatively, as @SimonWhas already mentioned, you can work with ACSearch and it should be simple to pull the data you need.
  • However, ACSearch rarely has the find location. Others have mentioned different systems that have that information, but you'd then need to have some unique identifier to correlate the data from the two databases. This is challenging because no such identifier exists. The photo may be different and even the size and weight may be slightly off. Of course, you can argue that if some other database has the find location, then it probably also has some other metadata and you should just work with that provider.

Alternatively, you could keep it simple and just "mine what you need". For example, a die study will likely involve a very small number of types. I'm not familiar with the attributions of sceattas, but you could just search ACSearch for those specific types, and only add those to your database. That is a far simpler task. You could then focus your efforts on some code to better compare two photos to determine whether they are die matches. In programming terms this is called "lazy initialization" and there's nothing wrong with it. 🙂

I guess the key here is to focus on the task you're trying to achieve. 

  • Like 1
Link to comment
Share on other sites

  • Benefactor

Thinking about this a bit further, the following could be built in a few hours:

  • Process watches a directory and waits for a new file to be added
  • When it is, it pops up a dialog with a single text box
  • You paste the text in there and it uses Regex's to pull apart the data and write it somewhere
  • It then moves the file somewhere else and adds the path to the metadata

You can then traverse search results from ACSearch or wherever else. Right click and save the photo. When the dialog pops up, paste the text from the listing. This should allow very quick entries. It should also work (depending on your Regex's) on a variety of sites.

Personally, I would store the metadata in a JSon file and figure out from there where to put it for querying. I dislike storing my data in anything proprietary. As I previously mentioned, the database itself should be the least of your concerns. Your top concern is getting the data.

  • Like 1
Link to comment
Share on other sites

On a side note, is there a way to get a list of past auctions from Bidder? 

Bidder has many auctions from lesser-known/smaller dealers. They only offer a few coins of my interest, but being pedantic, I want to keep a record that I checked them. 

Past auctions are listed on Biddr, but manually recording them one by one takes a lot of time, and I do not know how to automate this. 

Edited by Rand
Link to comment
Share on other sites

For maximum flexibility I'd go with Steve's (@rNumis's) suggestion of mySQL - however for most of what I think you are describing as your "uses", I would keep it as simple as possible (no software development expertise needed) and use Excel.  IMAGES are the biggest challenge in Microsoft Excel but there is a function IMAGE() that can let you display an image from a URL (web address) - so if you put your images somewhere web reachable, you can then work with your images nicely in Excel.

Edited by Sulla80
  • Like 1
Link to comment
Share on other sites

3 hours ago, Rand said:

Past auctions are listed on Biddr, but manually recording them one by one takes a lot of time, and I do not know how to automate this. 

The terms of Biddr prohibit scraping.  You agreed to those terms when you signed up for Biddr.

Eventually the people who run these sites (Biddr is run by the skayo corporation) will realize the small demand for researchers who want custom queries with CSV output and start charging for that.  Until then, we should follow the rules we agreed to.

  • Like 1
Link to comment
Share on other sites

3 hours ago, Ed Snible said:

The terms of Biddr prohibit scraping. 

Thank you. This is a very important point. I need to be double vigilant on when taking notes about the occurrence of publicly announced events becomes scraping. I do not keep any information from Biddr outside my Watch list on their account, so this would not be an issue this time.

To be clear, I was referring to legitimate means of obtaining such information, either from Biddr owners or any authorised third parties, with any charges as appropriate.

Do you say that such information cannot be (legitimately) obtained, or are you unaware that it can be obtained?

Link to comment
Share on other sites

Join the conversation

You can post now and register later. If you have an account, sign in now to post with your account.
Note: Your post will require moderator approval before it will be visible.

Guest
Reply to this topic...

×   Pasted as rich text.   Paste as plain text instead

  Only 75 emoji are allowed.

×   Your link has been automatically embedded.   Display as a link instead

×   Your previous content has been restored.   Clear editor

×   You cannot paste images directly. Upload or insert images from URL.

×
×
  • Create New...