Scrape and download pdf files from google or bing using PowerShell

Here is a very simple script that you could execute using PowerShell ISE. It could probably be written much better, but it works. The script just uses the power of the google search engine by searching for a specific filetype. This should also work with the Bing search engine.
To make the script work, make sure you have a directory C:\temp\dwnld\ created. Also you could easily change the regular expression pattern and the keywords.

Comments with modifications on the scripts are always welcome ;)

$keywords = @("manual", "microsoft", "powershell")
$pattern = 'http://(.*?)[.]{1}pdf'
$storageDir = "C:\temp\dwnld\"
$filetype = "pdf"
$rand = New-Object System.Random

$keywords | foreach {
    $urlToScrapeWithKeyword = "http://www.google.be/search?hl=nl&tbo=d&biw=1229&bih=677&output=search&sclient=psy-ab&q={0}+filetype%3A{1}&btnK=" -f $_, $filetype
    $urlToScrapeWithKeyword | Out-Default
    (Invoke-WebRequest -UseBasicParsing -Uri $urlToScrapeWithKeyword).Links | select -ExpandProperty href | Get-Unique | foreach {
        if ($_ -match $pattern) {
            $Matches[0] | Out-Default
            try {
                Start-BitsTransfer $Matches[0] $storageDir
                "Download ok" | Out-Default
            } catch [exception] {
                "Download failed:" | Out-Default
                $_.Exception.Message | Out-Default
            }
            "Sleeping" | Out-Default
            Start-Sleep -s $rand.Next(20, 43)
        }
    }
}

Enjoy ;-)
Don’t forget, web scraping can be illegal! Use it with care!

Downloading files with PowerShell using the BitsTransfer module

@sstranger told me on Twitter that you can also use the BitsTransfer module, to download files via PowerShell (previous post).

He wrote a script to download several files from the microsoft website.

Here is a modified version of his script, using test files with size 50MB and 100MB:

# path to store the data
$global:path = "c:\TEMP\"

# loads the BitsTransfer Module
Import-Module BitsTransfer
Write-Host "BitsTransfer Module is loaded"

# test data from http://www.thinkbroadband.com/download.html
$fileLinks = @("http://download.thinkbroadband.com/50MB.zip",
 "http://download.thinkbroadband.com/100MB.zip");

# start the download
Foreach ($fileLink in $fileLinks)
{
 Start-BitsTransfer $fileLink $path
}
Write-Host "Files are downloaded to $path"

Save this script as PowerShellFileDownloader.ps1 and run it in the Windows PowerShell ISE (which is included by default in Windows 7).

If you can’t run the script due to security reasons, change your execution policy:
(read more about running PowerShell scripts here)

Set-ExecutionPolicy RemoteSigned

Warning: changing the ExecutionPolicy might cause a security risk.

Some screenshots of the script running with the BitsTranfser module:

Have fun! ;)