Friday 21 September 2012

Nice little one liner to return all the links from a webpage:

"http://www.google.com" | %{([regex]"<a.*?href=[""'](?<url>[^""^']+[.]*?)[""'].*?>(?<keywords>[^<]+[.]*?)</a>").Matches((new-object system.Net.WebClient).DownloadString($_)) | select @{Name="text";Expression={$_.Groups[2].Value.trim()}}, @{Name="href";Expression={$tmpVal = $_.Groups[1].Value.trim();if ($tmpVal.startswith("http")){$tmpVal}else{$url+$tmpVal}}}} 


No comments:

Post a Comment