- Thread starter
- #41
OK v8 in first post. This is I guess nearly as far as this can go (and has made me realise that the allmusic data is not actually as good quality as I thought...)
I have added some more checking on albums (messes around with punctuation marks as was having issues matching some albums due to extra . or ! characters). Also I have fixed the brackets now (I believe...) and I have cleaned up the code a fair bit.
I guess there are a couple of questions really...
Do people actually want all artists scraping or just album artists?
We are currently just scraping main albums and compilations do people want to scrape singles? The quality seems to go downhill the further you get away from main albums... Currently we pick up the first match but I am noticing that some albums have multiple entries in the compilation tab. Singles are even worse and there are masses of entries (some of which look rubbish). Most singles also have very few details anyway so not sure of the value in scraping these.
The only improvement I was thinking of was to look through the artists where there are duplicates and if there is only a single entry with years active set then pick that. This however could lead to lots more false positives.
Any feedback on what is missing still appreciated.
I have added some more checking on albums (messes around with punctuation marks as was having issues matching some albums due to extra . or ! characters). Also I have fixed the brackets now (I believe...) and I have cleaned up the code a fair bit.
I guess there are a couple of questions really...
Do people actually want all artists scraping or just album artists?
We are currently just scraping main albums and compilations do people want to scrape singles? The quality seems to go downhill the further you get away from main albums... Currently we pick up the first match but I am noticing that some albums have multiple entries in the compilation tab. Singles are even worse and there are masses of entries (some of which look rubbish). Most singles also have very few details anyway so not sure of the value in scraping these.
The only improvement I was thinking of was to look through the artists where there are duplicates and if there is only a single entry with years active set then pick that. This however could lead to lots more false positives.
Any feedback on what is missing still appreciated.