AppStore scraping – back to the drawing board

I had assumed that in a browse URL such as http://ax.itunes.apple.com/WebObjects/MZStore.woa/wa/browse?path=/6014/7001/1, the last number was some kind of paging option, with each page returning up to 2500 apps. iTunes only seems to display that umber or items per category, and that’s the format of the URL it uses, so it made sense. But having actually tried it I find out that the xml returned is the same regardless of the number at the end. It returns an error if you don’t put a number, but put anything from 0 to 99 and you get the same list of apps. Which is kind of a pain, because that leaves a lot of apps unreachable. I can get to around 35000 using the browse method, but according to apptism there are currently around 49000 apps. The only way round this that I can see is to abandon the browse approach and scrape from the front page link for each category and page through 20 at a time. It’s probably going to be slow but I don’t see any choice at the moment. Of course I’ll report my findings here.

Tags: , , ,

6 Responses to “AppStore scraping – back to the drawing board”

  1. Tim says:

    Oh that sucks… I was using the same method as yours but didn’t actually get to check that it was indeed pagination or not, because I was working on the parsing of the file and putting the DB up.

    Back to the drawing board indeed. (even though 35000 apps should get me started for now)

  2. paul says:

    You know, I’m pretty certain that the paging thing used to work, but it certainly didn’t when I came to actually use it. But then strange things have been going on with the app store lately. Yesterday about 50% of the apps gave me an error report that they were no longer available. Changing the user agent seemed to get past that but then I was seeing other random weirdness as well. Not sure if they’re trying to stop scrapers or the store is just experiencing growing pains.

  3. Tim says:

    If they wanted to stop scrapers, I’m sure they could find a sure way. The thing is that in the end scrapers are not really taking business away from them (besides a small commission), quite the opposite, so I surely hope that they see that themselves and let it be. They don’t have to make it easy (they surely don’t!) but I hope they won’t go out of their way to stop it.

  4. paul says:

    I’m sure Apple the organization realizes that scrapers are not a problem, I just wonder if once in a while some net ops person spots a lot of traffic coming from them (there are a few running now) and decides to do something to reduce load once in a while. But you’re right, giving people better ways to find apps has got to help Apple and app makers in the long term, and to try to stop them would be harmful.

  5. Dawgfather says:

    Paul – it seems that ITunes has a limit of 2500 in the browser, I haven’t yet figured out which 2500 get into each category, but this is a real setback for app developers. Customers will have to know the app name to find it.

  6. paul says:

    Hi – yes indeed. Although scanning through a list of 2500 apps is hardly user friendly either. Well, actually all apps are reachable, but it may require a lot of clicking to find it. Basically once your app drops off of the front page you’re pretty much on your own, which is I guess why so many 3rd party sites have sprung up. The problem isn’t just for developers, as a customer I find using the iTunes store front incredibly painful. My favourite site right now is the MacWorld one which has a lot of nice sort and filter options.

Leave a Reply