Web Scraping with PowerShell

In PowerShell v3 you have some new useful cmdlets that allow you to download and parse a website.
The code in this post will demonstrate very basic scripts that could get you started with Web Scraping.

If you don’t know if you have PowerShell v3, use this command to find out:

get-host

The first script to get you started with web scraping:

$site = Invoke-WebRequest -UseBasicParsing -Uri www.bing.com
$site.Links | Out-GridView

This will give you all the links from the given website in a gridview.

The next script will give you all the email addresses that are in a mailto: anchor:

$site = Invoke-WebRequest -UseBasicParsing -Uri www.mywebsite.net
$site.Links | foreach {
if ($_.href.ToLower().StartsWith("mailto:")) {
$_.href.SubString(7) | Out-Default
}
}

By coincidence the ‘mywebsite.net’ has anchors using the mailto: prefix.

The last script is a very cool script from StackOverflow where I just modified the url to make sure the script works in several European countries:

function Get-FlightStatus {
     param($query)
$url = "http://www.bing.com?cc=us&q=flight status for $query"
$result = Invoke-WebRequest $url
$result.AllElements |
        Where Class -eq "ans" |
        Select -First 1 -ExpandProperty innerText
}

Use it like this:
(to test you can just paste this after the function in Windows PowerShell ISE )

Get-FlightStatus LH3102

It will give you a result similar to this:

Flight status for Lufthansa 3102 
flightstats.com · 2 minutes ago   

Departing on time at 5:35 PM from HAM 
FROMHAM 
Hamburg5:35 PM 
12/30/2012Terminal 2 
TOVIE 
Vienna7:05 PM 
12/30/2012

PS C:\>

Don’t forget, web scraping can be illegal!

Have fun 😉

Take a look at “Web Scraping with Perl” and the PowerShell tag.

Advertisements

7 thoughts on “Web Scraping with PowerShell

  1. Illegal? From what I understand screen scraping is not illegal, unless you are scraping website data and storing that information in a database. Am I correct on this?

    • I don’t know the details, I’m just warning the people.
      It is possible that you run a script for a few days and that you download and store information that is copyrighted or needs permission from the owner/creator, etc…
      Also I don’t know how every country handles these thinks.

      But I’m curious, if you know any answer, please share with us 🙂

  2. This is a very nice article indeed – thank you! As for the “legality” stuff…
    The legality is tied mostly to derivative use. You can’t prevent scraping because a browser is essentially a scraper too. Technically, your eyes are web scrapers too.

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s