Scraping iTunes part iii – reading the categories list
My last post on scraping the iTunes store showed how to read the categories page. We had a URL something like this:
http://ax.itunes.apple.com/WebObjects/MZStore.woa/wa/browse?path=/36/6081/1/
(for books). Sending that in produces a large xml response – the interesting parts look like this:
<dict>
<key>artistId</key>
<integer>293260414</integer>
<key>artistName</key>
<string>Saxorama.net</string>
<key>buy-only</key>
<true/>
<key>buyParams</key>
<string>
productType=C&salableAdamId=294770918&pricingParameters=STDQ&price=0&ct-id=14
</string>
<key>genre</key>
<string>Productivity</string>
<key>genreId</key>
<integer>6007</integer>
<key>itemId</key>
<integer>294770918</integer>
<key>itemName</key>
<string>EasyWriter</string>
<key>kind</key>
<string>software</string>
<key>playlistName</key>
<string>EasyWriter</string>
<key>popularity</key>
<string>0.13890815</string>
<key>price</key>
<integer>0</integer>
<key>priceDisplay</key>
<string>Free</string>
<key>releaseDate</key>
<string>2009-04-06T07:00:00Z</string>
<key>s</key>
<integer>143441</integer>
<key>softwareIcon57x57URL</key>
<string>
http://a1.phobos.apple.com/us/r30/Purple/40/ce/15/mzl.dtewfrse.png
</string>
<key>softwareIconNeedsShine</key>
<false/>
<key>softwareSupportedDeviceIds</key>
<array>
<integer>1</integer>
</array>
<key>softwareVersionBundleId</key>
<string>net.sax.easywriter</string>
<key>softwareVersionExternalIdentifier</key>
<integer>1589121</integer>
<key>softwareVersionExternalIdentifiers</key>
<array>
<integer>875361</integer>
<integer>1472572</integer>
<integer>1486886</integer>
<integer>1589121</integer>
</array>
<key>url</key>
<string>
http://ax.itunes.apple.com/WebObjects/MZStore.woa/wa/viewSoftware?id=294770918&mt=8
</string>
</dict>
This is a set of summary details for one product, and there will be up to 3500 of them on the page that we just downloaded. While the information here is useful, we need to go to another page to get the complete set of information for the produce. We do this by extracting a URL using an xpath statement, which I will post next time.
[ad#co-1]
This tool grabs the descriptions with screenshots available
http://www.appstoresdk.com