Jump to content

I'm building an advanced database


Recommended Posts

A website for researching past auctions of ancient coin lots featuring 3 million records with highly configurable searching/filtering options. No advertising or paid sections, fully free to use.
 
Long post: Although not a new project, this is the first time in years I'm making mention of it as previously it ran on a private computer that couldn't handle more than a handful of connections. After a lot of work, the data has finally been ported over today to a dedicated server. The site is in a state of ongoing development to add new features, fix bugs and hopefully optimize performance further.
 
The first thing that makes Coryssa special is that it's an 𝑖𝑛𝑑𝑒𝑥𝑒𝑑 database. We're all familiar with the "normal" type of database that relies on a simple search bar to find relevant results. Indexing improves on this by pre-categorizing the records allowing one to quickly home in on areas of interest. While a global search is arguably still the best method for finding a very narrow range of coins, or a specific lot, it is an inefficient method for browsing a range of related records. It is the difference between a library and a pile of books.
 
Think of eBay or Amazon. You can go to the homepage and just use a keyword to find what you need but, with a few mouse clicks, you also have the option of navigating to niche areas to explore. Wit Coryssa this can be further limited by use of date, weight, specific auctioneers and other filters. The textbook example of this would be something like this: imagine how difficult it would be on a conventional database to find asses of augustus. The keyword Augustus forms part of a large number of Roman coin lot descriptions while "as" is a very common word so your search would pull in lots of irrelevant results. In Coryssa this can be accomplished by navigating from home page Roman Imperial > Augustus> and activating the tickbox under Denom to select As from the dropdown.
 
Another major feature has been the uploading of old catalogue data. While it's nice to have access to PDFs, nicer still is to have them all deconstructed into their constituent lots and their text descriptions OCR'd so they can be searchable along with all the other lots. The number of individual lots in these catalogs runs into the hundreds of thousands. This is no simple task but as of today I've single-handedly managed to get over 68,000 lots in there, covering some 160 pre-Y2k sales dating back to the late 1800's. While my free time is limited I can still manage entering a few hundred lots on a good day. My aim is to eventually get at least all the most famous sales uploaded.
 
Which leads me to the third important feature. The vision is for this to be a community driven project. For it to be the ultimate research tool it will benefit the most from its users becoming involved. This can be as simple as uploading an interesting coin you came across or reporting an error or fake. I expect that some will be enthusiastic enough that they may want to take on improving specific areas of their interest. This would be music to my ears as I'm already spread very thin. Ancient numismatics 𝑛𝑒𝑒𝑑𝑠 something akin to Wikipedia and this could become that.
 
Other features include
- Ability to adjust for inflation
- Automatic averaging of price and weight for coins in your search set
- Multiple ways to sort categories and search sets
 
Initially, this website was launched as a means of corralling the huge amount of data I needed in my ERIC book series. To this end, I started sucking in all the Roman Imperial and Byzantine coins I came across. However, sensing that it could be useful to others (and an eye to future monetization) I added support for other periods and launched it as "Coinvac" back in 2009. A few years later, during which time I couldn't manage to not get so much as a single subscriber, I went back to the drawing board and relaunched it as Coryssa with no paywall, but left it running on autopilot for the most part.
 
Which brings me to the disadvantages. No point in overselling it, there are some important weaknesses. First of all, outside of my two specialty areas (Roman Imperial and Byzantine) the indexing is still largely accomplished by machine. The autopilot feeds data and sorts into categories according to pre-defined queries. It's roughly ok overall but having limited experience in these periods, and even less time to actively curate them, it's... messy. Expect to come across grossly mis-categorized records, oodles of fakes sucked in from Ebay, missing images or garbled text, etc. I deleted the entire Medieval section for lack of interest and to save space. With it also went almost all Oriental coinage too. In an effort to improve the signal-to-noise ratio I've deleted large swathes of group lots, fakes (esp. ebay ones) and have a standing policy of not even bothering with data entry of old catalog lots unless they're accompanied by photos. Plus it's actually getting worse. As I find myself with increasingly less free time I've gotten into the habit of outright ignoring non-RI/Byz periods to focus only on the data I need for ERIC -- still the primary drive after all these years. This means that as I upload old catalogues nowadays I tend to just skip right over the Greek, Republican and Provincials and end right at the last Byzantine lot. So, effectively, the catalogs being entered are heavily skewed in these periods to the detriment of all others. It's just too much, sorry :- (
 
Secondly, searching in an indexed database with many tables and fields is very compute-intensive (at least with the current coding framework). Be a little forgiving of response times, especially in the first few days while the system is still caching and the effect of the expected initial wave of visitors. In time this will get better.
Then there is the fact that I quit coins during Covid and only got back into it last year. The effect on the database is that it stopped receiving new data past late 2020. As time allows the auctions that have closed since then will need to be uploaded but for now I'm mostly filling in the pre-Y2k era.
 
Another issue concerns certain types of coins which are borderline: those that could below in more than one period like some barbarous issues (is it Celtic or Roman?) or Iberian (do they belong in Greek or Celtic?). When dealing with series that could be either/or you might as well fall back to using global search as the preferred tool.
 
Lastly, I should also mention that I absolutely suck at marketing. If the project fails to gain traction from this single post (which I'll repost in a couple other places) then it will most likely continue to hobble along in obscurity. I have in the past, and might yet again, hire others to do data entry but I'd be discouraged if it was just a waste of money in a build-it-and-they-don't-come sort of way. Ideally, it will grow to become another well-known tool for ancient numismatics alongside ACSearch, OCRE and the rest. For my part, I don't need or want financial support. You'll notice there are no ads or pity pitches: account creation just gives you the benefit of adding comments to records, uploading coins and soon other features. Use the guest login if you don't care about these, you're not missing out on anything.
 
If you have questions, suggestions, bug reports, etc. you can reach me at my email rasiel5@gmail.com. There's also the old Coinvac group at https://www.facebook.com/coinvac which I'll need to figure out how to rename. Thank you and sorry the very long post!
 
Rasiel
  • Like 9
  • Thanks 5
  • Clap 1
Link to comment
Share on other sites

  • Benefactor

Thanks so much! I will investigate at length in the near future, after giving things some time to settle down. A couple of questions if I may: in addition to ebay sales and data from old auction catalogs, does the database include any non-ebay retail sales, as from platforms like VCoins and MA-Shops? Should I assume that you've omitted post-2000 auctions outside ebay, which are already largely covered by databases like ACSearch? 

  • Like 2
Link to comment
Share on other sites

Thanks for all of the hard work. I’ve used it over the years and as someone who buys eBay coins occasionally, I’ve found it extremely helpful for finding seller images and notes I forgot to save. It also is helpful in finding sale prices, although as many are from eBay, those prices can be all over the board. 

Here’s a coin that I bought years ago (2016) as a cleaning project. I forgot to save the seller’s image and went back last year to find it. I’ve made some progress on cleaning the coin but it’s been stubborn!

IMG_6147.png.817d97dc76aee1c42b180e2428091b91.png

Edited by Orange Julius
  • Like 3
  • Smile 1
Link to comment
Share on other sites

34 minutes ago, DonnaML said:

Should I assume that you've omitted post-2000 auctions outside ebay, which are already largely covered by databases like ACSearch? 

No, actually the opposite: ACSearch, CoinArchives and Coryssa largely overlap for auction house sales for the period 2000-2020. There's basically zero after 2020 on my site so you'll need to use the others for recently closed auctions until I can catch up. On the other hand, Coryssa is as far as I know the only site to have a significant amount of pre-2000 lots. ExNumis.com has a large database but is (I think) only for internal use.

Rasiel

  • Thanks 1
Link to comment
Share on other sites

1 hour ago, Postvmvs said:

Many thanks Rasiel, this is a great resource!

Do you have the post-2020 eBay information saved but not uploaded, or did you stop scraping in 2020?

No sir :'-(
Sorry. However, even back when the last records were being added I was already struggling to add filters effective in stopping the fakes and low value dregs. Three years later and it's at least three times worse. If I restart I might have to do something drastic like cherrypick the handful of good accounts still selling. It really can't be automated at this point without a lot of constant pruning.

  • Like 1
Link to comment
Share on other sites

Rasiel, Thank you so much for upgrading the server! You have built a fantastic resource that I use frequently and will now use more. Very useful for hunting down coins in my own collection but also for looking at other collections that were sold off on ebay. 

  • Like 2
Link to comment
Share on other sites

  • Benefactor
13 hours ago, rasiel said:

No, actually the opposite: ACSearch, CoinArchives and Coryssa largely overlap for auction house sales for the period 2000-2020. There's basically zero after 2020 on my site so you'll need to use the others for recently closed auctions until I can catch up. On the other hand, Coryssa is as far as I know the only site to have a significant amount of pre-2000 lots. ExNumis.com has a large database but is (I think) only for internal use.

Rasiel

Thanks for the clarification. What about pre-2000 retail sales, whether on ebay (the "buy it now" sales) or elsewhere? Are any such sales included in your database?

Link to comment
Share on other sites

@rasiel, this is a remarkable contribution!

I have been looking through Coryssa, and it will benefit my personal research (in archaic Lydian die studies) very much. Kudos.

I will do my main post-2000 auction searching on acsearch to keep load off of your computer, but this is an outstanding benefit to the community.

Edited by Bonshaw
  • Like 2
Link to comment
Share on other sites

  • Benefactor

@rasiel thank you very much for your efforts here. Even though my priority is ancient Greek (especially bronzes), a few searches indicated this will be very useful.

My advice would be to automate as much data entry as possible, because it's simply not possible to keep up with it by hand.

Link to comment
Share on other sites

Thank you all for the kudos!

Donna: no, as of now no pre-2000 eBay records. In fact, the earliest ones are from 2008 😞

I can buy the data from eBay but last time I checked it was extortionate

  • Thanks 1
Link to comment
Share on other sites

Join the conversation

You can post now and register later. If you have an account, sign in now to post with your account.
Note: Your post will require moderator approval before it will be visible.

Guest
Reply to this topic...

×   Pasted as rich text.   Paste as plain text instead

  Only 75 emoji are allowed.

×   Your link has been automatically embedded.   Display as a link instead

×   Your previous content has been restored.   Clear editor

×   You cannot paste images directly. Upload or insert images from URL.

×
×
  • Create New...