Modify httrack app ([login to view URL]). Source is located in [login to view URL]
The modifications requested are to adjust the copier to capture the additional information about images only and put it in a specific folder structure.
1) Capture the following metadata only for images in a separate XML file. The XML has the following format:
<?xml version="1.0"?>
<Image>
<src>[login to view URL]</src>
<pageURL>[login to view URL]</pageURL>
<alt>Spectacular: A time-lapse photo shows lightning bolts striking around the Puyehue-Cordon Caulle volcanic chain in Patagonia</alt>
<height>563</height>
<width>964</width>
</Image>
where
</src> tag is the image URL
</pageURL> tag is the URL of the page where the image is located
</alt> tag is the annotation to/title of the image, if available on the page
</height> tag is the image height
</width> tag is the image width
2) For xml file name use image [login to view URL] + xml format. For example if the image file name is [login to view URL] the XML file should be given a [login to view URL] name
3) Store xml and image files in the following directory structure, by domain:
<root_folder>
| --- <images>
| |
| | --- <domain_1>
| | --- <domain_2>
| …
|
| --- <metadata>
| |
| | --- <domain_1>
| | --- <domain_2>
| …