Posts Tagged ‘itunes appstore’

AppStore scraping – back to the drawing board

Tuesday, June 9th, 2009

I had assumed that in a browse URL such as http://ax.itunes.apple.com/WebObjects/MZStore.woa/wa/browse?path=/6014/7001/1, the last number was some kind of paging option, with each page returning up to 2500 apps. iTunes only seems to display that umber or items per category, and that’s the format of the URL it uses, so it made sense. But having actually tried it I find out that the xml returned is the same regardless of the number at the end. It returns an error if you don’t put a number, but put anything from 0 to 99 and you get the same list of apps. Which is kind of a pain, because that leaves a lot of apps unreachable. I can get to around 35000 using the browse method, but according to apptism there are currently around 49000 apps. The only way round this that I can see is to abandon the browse approach and scrape from the front page link for each category and page through 20 at a time. It’s probably going to be slow but I don’t see any choice at the moment. Of course I’ll report my findings here.

MacWorld AppGuide

Wednesday, June 3rd, 2009

MacWorld have launched their appguide site for iPhone apps. I think this raises the bar on appstore scrapers – it displays a lot of information and makes it simple to find cool apps. And the editorial content adds a lot of value too – there are just too many apps out there to navigate without some kind of guidance. And they’re giving away iTunes gift cards for reviews – of course they can’t restrict it to purchasers only as Apple can so it’s open to abuse, but historically they’ve been pretty hard on spammers and abusers so we’ll have to give them the benefit of the doubt. Check it out.
[ad]

Apple blocking curl from the Appstore?

Tuesday, May 19th, 2009

Not quite sure what’s going on with the AppStore. I just resumed my experiments and it appears that a couple of things have changed. Firstly calls from curl seem to be blocked – although changing the user agent seems to get round that. Why they would impose such a trivially bypassed hurdle is a bit of a mystery – surely if there is a target of a block there are better ways to keep them out, like ip address blocking. It is interesting that they aren’t moving to impose a total block from non-iTunes clients though, clearly that is a tacit admission that they are allowing store scraping at some level. More seriously, some of the browse URLs I was using previously don’t appear to work any more. I’m sure I can figure out what’s going on but I’m going to need more time than I have now to investigate. I’ll post back as soon as I figure it out.

[ad#co-1]

Scraping the iTunes AppStore part ii – categories

Tuesday, April 21st, 2009

My previous post on scraping the iTunes Appstore showed how to retrieve the top level page for iPhone apps – basically a list of the available categories and their associated IDs. The next step is to get a list of the applications in each category.
The previous call gave us a url:
http://ax.itunes.apple.com/WebObjects/MZStore.woa/wa/viewGenre?id=
and a list of name/value pairs like this:

<dict>
<key>itemName</key><string>Books</string>
<key>itemId</key><integer>6018</integer>
</dict>

It looks like we need to combine the URL we just received with the ids to get a page containing links to all the apps in that category, so for all books we would use-
http://ax.itunes.apple.com/WebObjects/MZStore.woa/wa/viewGenre?id=6081
but unfortunately I couldn’t get this to work. I have no idea what that URL is doing there. Instead I saw that iTunes was using a variant of the original URL, and sending this:
http://ax.itunes.apple.com/WebObjects/MZStore.woa/wa/browse?path=/36/6081/1/
This indeed returns a list of applications belonging to the requested category, although it seems to be limited to 3500 results per category That is enough for most categories right now. As the store gets bigger more categories will exceed this – I would then expect new categories to be added, or the limit to be raised. If you really need all the apps in the store, you’ll have to use one of teh other approaches I mentioned earlier.
Again, the response is an xml message. It’s quite a complicated one, but my next post on the subject will show you how to get details of individual applications.
[ad#co-1]