Scrape and download pdf files from google or bing using PowerShell

Here is a very simple script that you could execute using PowerShell ISE. It could probably be written much better, but it works. The script just uses the power of the google search engine by searching for a specific filetype. This should also work with the Bing search engine.
To make the script work, make sure you have a directory C:\temp\dwnld\ created. Also you could easily change the regular expression pattern and the keywords.

Comments with modifications on the scripts are always welcome ;)

$keywords = @("manual", "microsoft", "powershell")
$pattern = 'http://(.*?)[.]{1}pdf'
$storageDir = "C:\temp\dwnld\"
$filetype = "pdf"
$rand = New-Object System.Random

$keywords | foreach {
    $urlToScrapeWithKeyword = "http://www.google.be/search?hl=nl&tbo=d&biw=1229&bih=677&output=search&sclient=psy-ab&q={0}+filetype%3A{1}&btnK=" -f $_, $filetype
    $urlToScrapeWithKeyword | Out-Default
    (Invoke-WebRequest -UseBasicParsing -Uri $urlToScrapeWithKeyword).Links | select -ExpandProperty href | Get-Unique | foreach {
        if ($_ -match $pattern) {
            $Matches[0] | Out-Default
            try {
                Start-BitsTransfer $Matches[0] $storageDir
                "Download ok" | Out-Default
            } catch [exception] {
                "Download failed:" | Out-Default
                $_.Exception.Message | Out-Default
            }
            "Sleeping" | Out-Default
            Start-Sleep -s $rand.Next(20, 43)
        }
    }
}

Enjoy ;-)
Don’t forget, web scraping can be illegal! Use it with care!

Download file with PowerShell

Downloads and saves a file in the current working directory of PowerShell.

Can you use the previous working directory ($pwd) or change it to a fixed location: “C:\Download” or by first change the directory in PowerShell.
Examples of changing the directory:

$storageDir = $pwd

# or:

$storageDir = "C:\Downloads"

#or:

cd C:\Users\Teusje\Documents
$storageDir = $pwd

 

Below is the script to download a file via PowerShell. You can run it directly in PowerShell:

$storageDir = $pwd
$webclient = New-Object System.Net.WebClient
$url = "http://teusje.files.wordpress.com/2011/02/giraffe-header1.png"
$file = "$storageDir\myNewFilename.jpg"
$webclient.DownloadFile($url,$file)

Have fun! ;)