Scraping the iTunes AppStore part ii – categories
My previous post on scraping the iTunes Appstore showed how to retrieve the top level page for iPhone apps – basically a list of the available categories and their associated IDs. The next step is to get a list of the applications in each category.
The previous call gave us a url:
http://ax.itunes.apple.com/WebObjects/MZStore.woa/wa/viewGenre?id=
and a list of name/value pairs like this:
<dict>
<key>itemName</key><string>Books</string>
<key>itemId</key><integer>6018</integer>
</dict>
It looks like we need to combine the URL we just received with the ids to get a page containing links to all the apps in that category, so for all books we would use-
http://ax.itunes.apple.com/WebObjects/MZStore.woa/wa/viewGenre?id=6081
but unfortunately I couldn’t get this to work. I have no idea what that URL is doing there. Instead I saw that iTunes was using a variant of the original URL, and sending this:
http://ax.itunes.apple.com/WebObjects/MZStore.woa/wa/browse?path=/36/6081/1/
This indeed returns a list of applications belonging to the requested category, although it seems to be limited to 3500 results per category That is enough for most categories right now. As the store gets bigger more categories will exceed this – I would then expect new categories to be added, or the limit to be raised. If you really need all the apps in the store, you’ll have to use one of teh other approaches I mentioned earlier.
Again, the response is an xml message. It’s quite a complicated one, but my next post on the subject will show you how to get details of individual applications.
[ad#co-1]
Tags: itunes appstore
I figure you solved that in the meantime, but I think you made a typo while trying to retrieve the list of apps in the Books category.
You wrote that you tried http://ax.itunes.apple.com/WebObjects/MZStore.woa/wa/viewGenre?id=6081 while the Books category is 6018.
The link http://ax.itunes.apple.com/WebObjects/MZStore.woa/wa/viewGenre?id=6018 works for me.
The second link you put that works doesn’t work for me though. If I change 6081 to 6018 it does, so I suppose you fixed it
Thanks for the walk-though by the way!
Hi Tim – yes sorry about that, my fingers have mild dyslexia sometimes. Thanks for posting the correction.