Further to my recent posts covering scraping the itunes appstore – I have made some progress towards decoding the browse URL that returns the list of apps by category. There is a slight wrinkle with categories that have sub-categories (currently only games) and a potential work-around to the 3500-per-page limit.
The browse URL breaks down to this:
http://ax.itunes.apple.com/WebObjects/MZStore.woa/wa/browse?path=/category/subcategory/page
The top level browse URL, ie
http://ax.itunes.apple.com/WebObjects/MZStore.woa/wa/browse
on its own gives a list of top level categories and their associated ids- eg TV shows is 32, Music videos is 31, Music is 34 and AppStore is 36.
So to browse a category from the root, you append the URL with the query string path=/id. Ie the AppStore URL is
http://ax.itunes.apple.com/WebObjects/MZStore.woa/wa/browse?path=/36
which returns a list of AppStore categories and their ids – Weather = 6001, Travel = 6003, Games = 6014, etc.
Then, to browse all weather apps the URL is
http://ax.itunes.apple.com/WebObjects/MZStore.woa/wa/browse?path=/36/6001/1
where the final 1 seems to be a paging control – so where there are > 3500 apps you can increment the last number to retrieve the next set of app details.
Where there are subcategories, they can be accessed by replacing the top level id with that of the category – so to browse all games subcategories the URL is
http://ax.itunes.apple.com/WebObjects/MZStore.woa/wa/browse?path=/6014
which returns the names and ids of the games subcategories (Action = 7001, Adventure = 7002, and so on). Then to browse the action games the URL becomes
http://ax.itunes.apple.com/WebObjects/MZStore.woa/wa/browse?path=/6014/7001/1
It looks to me that currently if the tree is traversed from the root until the list of subcategories returns an empty list, and then the leaf node is used to retrieve the apps, there are no need for paging with a value of greater than 1. This is also the only method I can see for determining which subcategory an app is listed under – the apps themselves link to the category and a genre but not a subcategory. I also don’t know right now if this will produce multiple instances of the same app – ie if an app can appear under multiple subcategories.
[ad#co-1]