Scrape and download pdf files from google or bing using PowerShell

Here is a very simple script that you could execute using PowerShell ISE. It could probably be written much better, but it works. The script just uses the power of the google search engine by searching for a specific filetype. This should also work with the Bing search engine.
To make the script work, make sure you have a directory C:\temp\dwnld\ created. Also you could easily change the regular expression pattern and the keywords.

Comments with modifications on the scripts are always welcome 😉

$keywords = @("manual", "microsoft", "powershell")
$pattern = 'http://(.*?)[.]{1}pdf'
$storageDir = "C:\temp\dwnld\"
$filetype = "pdf"
$rand = New-Object System.Random

$keywords | foreach {
    $urlToScrapeWithKeyword = "http://www.google.be/search?hl=nl&tbo=d&biw=1229&bih=677&output=search&sclient=psy-ab&q={0}+filetype%3A{1}&btnK=" -f $_, $filetype
    $urlToScrapeWithKeyword | Out-Default
    (Invoke-WebRequest -UseBasicParsing -Uri $urlToScrapeWithKeyword).Links | select -ExpandProperty href | Get-Unique | foreach {
        if ($_ -match $pattern) {
            $Matches[0] | Out-Default
            try {
                Start-BitsTransfer $Matches[0] $storageDir
                "Download ok" | Out-Default
            } catch [exception] {
                "Download failed:" | Out-Default
                $_.Exception.Message | Out-Default
            }
            "Sleeping" | Out-Default
            Start-Sleep -s $rand.Next(20, 43)
        }
    }
}

Enjoy 😉
Don’t forget, web scraping can be illegal! Use it with care!

Advertisements

SkyDrive update: now you can select what to sync

With today’s release, you can now select which folders from SkyDrive are synced – making it easier to use SkyDrive with laptops or tablets with small drives. You’re in control. If you’d like to keep all your photos and documents in SkyDrive but only sync a folder of your most important documents to your laptop, you can do that – even if your desktop is syncing the full set. You can choose specific sub-folders to sync as well; you aren’t limited to your primary SkyDrive folders.

We expect the update to be available to everyone within 48 hours. If you’re anxious and want it now, you can find links to download updates for Windows, Windows Phone, Mac, and Android from http://apps.live.com/skydrive.

Find out more on:

Kinect SDK Beta

Download the Kinect SDK Beta (x86) (x64)

 

 

Check out the Quickstarts for Kinect SDK!
See also Kinect for Windows SDK Quickstarts from Channel 9.

More information on the Microsoft Research project website.

[ source ]

The App Download Tag

Use the new App Download Tag to link users to experiences based on their mobile platform

Now a Tag can be smart, knowing the type of phone that is scanning the Tag and allowing you to intelligently deliver experiences based on mobile platform. App Download Tag is a new type of Tag that can be linked to a specific mobile marketplace or mobile site optimized for the device.

Have a mobile app for Windows Phone, iPhone, Android, and Blackberry? Create a App Download Tag that recognizes what kind of device is scanning the Tag and sends the user to the relevant app marketplace. No longer will you need to rely on consumers finding your app through marketplace searches – instead, you can direct them to the exact download location so Tag scanners get the right experience every time.

Have you built a mobile site that uses Flash technology? Direct your iPhone users to a site that is instead optimized for their device, ensuring that they get the best mobile experience possible. Eliminate broken rich media elements on your mobile site – using the App Download Tag will allow you to optimize your mobile experience based on the device capabilities.

One Tag for all. The beauty of App Download Tag is that you can use the same Tag for all mobile platforms. There’s no need to print multiple Tags pointing to different URLs. The App Download Tag will do the work to deliver the right URL for each device.

App Download Tags can be created for Windows Phone, iPhone, Android, BlackBerry, Symbian and J2ME platforms. Users can also be sent to a default URL if a specific platform is not indicated

(via)

[ source ] [ source2 ] [ twitter ]

Downloading files with PowerShell using the BitsTransfer module

@sstranger told me on Twitter that you can also use the BitsTransfer module, to download files via PowerShell (previous post).

He wrote a script to download several files from the microsoft website.

Here is a modified version of his script, using test files with size 50MB and 100MB:

# path to store the data
$global:path = "c:\TEMP\"

# loads the BitsTransfer Module
Import-Module BitsTransfer
Write-Host "BitsTransfer Module is loaded"

# test data from http://www.thinkbroadband.com/download.html
$fileLinks = @("http://download.thinkbroadband.com/50MB.zip",
 "http://download.thinkbroadband.com/100MB.zip");

# start the download
Foreach ($fileLink in $fileLinks)
{
 Start-BitsTransfer $fileLink $path
}
Write-Host "Files are downloaded to $path"

Save this script as PowerShellFileDownloader.ps1 and run it in the Windows PowerShell ISE (which is included by default in Windows 7).

If you can’t run the script due to security reasons, change your execution policy:
(read more about running PowerShell scripts here)

Set-ExecutionPolicy RemoteSigned

Warning: changing the ExecutionPolicy might cause a security risk.

Some screenshots of the script running with the BitsTranfser module:

Have fun! 😉

Download file with PowerShell

Downloads and saves a file in the current working directory of PowerShell.

Can you use the previous working directory ($pwd) or change it to a fixed location: “C:\Download” or by first change the directory in PowerShell.
Examples of changing the directory:

$storageDir = $pwd

# or:

$storageDir = "C:\Downloads"

#or:

cd C:\Users\Teusje\Documents
$storageDir = $pwd

 

Below is the script to download a file via PowerShell. You can run it directly in PowerShell:

$storageDir = $pwd
$webclient = New-Object System.Net.WebClient
$url = "https://teusje.files.wordpress.com/2011/02/giraffe-header1.png"
$file = "$storageDir\myNewFilename.jpg"
$webclient.DownloadFile($url,$file)

Have fun! 😉