Web Scraping with PowerShell

In PowerShell v3 you have some new useful cmdlets that allow you to download and parse a website.
The code in this post will demonstrate very basic scripts that could get you started with Web Scraping.

If you don’t know if you have PowerShell v3, use this command to find out:

get-host

The first script to get you started with web scraping:

$site = Invoke-WebRequest -UseBasicParsing -Uri www.bing.com
$site.Links | Out-GridView

This will give you all the links from the given website in a gridview.

The next script will give you all the email addresses that are in a mailto: anchor:

$site = Invoke-WebRequest -UseBasicParsing -Uri www.mywebsite.net
$site.Links | foreach {
if ($_.href.ToLower().StartsWith("mailto:")) {
$_.href.SubString(7) | Out-Default
}
}

By coincidence the ‘mywebsite.net’ has anchors using the mailto: prefix.

The last script is a very cool script from StackOverflow where I just modified the url to make sure the script works in several European countries:

function Get-FlightStatus {
     param($query)
$url = "http://www.bing.com?cc=us&q=flight status for $query"
$result = Invoke-WebRequest $url
$result.AllElements |
        Where Class -eq "ans" |
        Select -First 1 -ExpandProperty innerText
}

Use it like this:
(to test you can just paste this after the function in Windows PowerShell ISE )

Get-FlightStatus LH3102

It will give you a result similar to this:

Flight status for Lufthansa 3102 
flightstats.com · 2 minutes ago   

Departing on time at 5:35 PM from HAM 
FROMHAM 
Hamburg5:35 PM 
12/30/2012Terminal 2 
TOVIE 
Vienna7:05 PM 
12/30/2012

PS C:\>

Don’t forget, web scraping can be illegal!

Have fun 😉

Take a look at “Web Scraping with Perl” and the PowerShell tag.

Advertisements

Windows Server: logging users logon and logoff via PowerShell

You are planning a migration and you want to track and monitor for a few weeks when your server is being used the most?

  1. Open Windows PowerShell ISE ( or notepad 😉 )
  2. Add this PowerShell line below and save the script as monitorlogon.ps1
  3. "logon {0} {1} {2:yyyy-MM-dd HH:mm:ss}" -f $env:username, $env:computername, (Get-Date) >> logon.log
  4. Create another script file, add the PowerShell line below and save it as monitorlogoff.ps1
  5.  "logff {0} {1} {2:yyyy-MM-dd HH:mm:ss}" -f $env:username, $env:computername, (Get-Date) >> logoff.log
  6. Start the Logal Group Policy Editor ([Windows]+[r] > gpedit.msc)
  7. Navigate to [User Configuration] > [Windows Settings] > [Scripts (Logon/Logoff)]
  8. Double click on the [Logon] name
  9. Navigate to the [PowerShell Scripts] tabpage
  10. Click the [Add] button and select your monitorlogon.ps1 script.
  11. Optionally you can select the execution order, default is set to “Not configured”.
  12. Repeat again from step 6. for the Logoff script.

You can change the >> filename.log part to >> \\MyShare\filename.log.

If you want to do this on a Windows Server 2003 where you can’t run your PowerShell you will need to save the file as an *.cmd:

  1. Create a new file and call it monitorlogon.cmd
  2. Enter the line below and save the script as monitorlogon.cmd:
  3. echo logon %username% %computername% %date% %time% >> C:\logon.log
  4. Repeat this for monitorlogoff.cmd and adjust the script line.
  5. Follow the steps from the PowerShell script.

PowerShell: List all folders where access is denied

I just needed a list with all folders I couldn’t access.

Here is the PowerShell script (you may have to change the directory C:\app):

$errors=@()
get-childitem -recurse 'C:\app' -ea silentlycontinue -ErrorVariable +errors | Out-Null
$errors.Count
$errors | Foreach-Object { Write-Host $_ }

If you want to see everything in your PowerShell console, remove the | Out-Null.

This is a possible output from the script:

3 
Access to the path 'C:\app\pfile' is denied. 
Access to the path 'C:\app\adump' is denied. 
Access to the path 'C:\app\diag' is denied.

Enjoy 😉

[ technet ]