Sunday 22 January 2017

Introduction to perl LWP

In this article I'd like demonstrate a brief overview of the usage of LWP module provided by perl.
The libwww-perl collection is a set of Perl modules which provides a simple and consistent application programming interface (API) to the World-Wide Web.
The main focus of the library is to provide classes and functions that allow you to write WWW clients.

I'v read that the module should be available out of the box in my perl distribution.
Bu it wasn't available in my centOS 7 box. So I installed via via yum.

yum install *perl-LWP* -y

With that done let's get to the code.

The first example is to print the html source code a web page.

[root@alive ~]# cat web.pl
#!/usr/bin/perl -w
#
use strict;
use LWP::Simple;

print get("https://www.4shared.com/");


The module used is LWP::Simple. The get function gets the source code of the web page & the print function prints it out to STDOUT.
We get the below output when we run the program:

[root@alive ~]# ./web.pl | more
Wide character in print at ./web.pl line 6.
<!DOCTYPE html>
<html>
<head>
  <meta charset="utf-8">
  <title>4shared.com - free file sharing and storage</title>
  <meta name="Description" content="Online file sharing and storage - 15 GB free web space. Easy registration. File upload progressor. Multiple file transfer. Fast download.">
  <meta name="Keywords" content="file sharing, free web space, online storage, share files, photo image music mp3 video sharing, dedicated hosting, enterprise sharing, file transfer, file hosting, internet file sharing">
  <meta name="google-site-verification" content="TAHHq_0Z0qBcUDZV7Tcq0Qr_Rozut_akWgbrOLJnuVo"/>
  <meta name="google-site-verification" content="1pukuwcL35yu6lXh5AspbjLpwdedmky96QY43zOq89E" />
  <meta name="google-site-verification" content="maZ1VodhpXzvdfDpx-2KGAD03FyFGkd7b7H9HAiaYOU" />
  <meta name="viewport" content="width=device-width, initial-scale=1" />
  <meta content="IE=edge" http-equiv="X-UA-Compatible">

  <meta property="og:title" content="4shared - free file sharing and storage"/>
<meta property="og:description" content="4shared is a perfect place to store your pictures, documents, videos and files, so you can share them with friends, family, and the world. Claim your free 15GB now!"/>
<link rel="stylesheet" type="text/css" href="https://static.4shared.com/css/common_n.4min.css?ver=2118177915"/>
<link rel="stylesheet" type="text/css" href="https://static.4shared.com/css/ui/elements.4min.css?ver=1246632214"/>
<link rel="stylesheet" type="text/css" href="https://static.4shared.com/auth-popup.4min.css?ver=-2080519390"/>
<link rel="stylesheet" type="text/css" href="https://static.4shared.com/css/themes/account/icons.4min.css?ver=-1551370407"/>
<link rel="stylesheet" type="text/css" href="https://static.4shared.com/css/tipTip.4min.css?ver=-207359769"/>
<script type="text/javascript" src="https://static.4shared.com/js/jquery/jquery-1.9.1.4min.js?ver=-885436651"></script>
<script type="text/javascript" src="https://static.4shared.com/js/jquery/jquery-migrate-1.2.1.4min.js?ver=1171340321"></script>
<script type="text/javascript">
--------------------------------------------
-------------------------------------------- Output truncated for brevity


In the next example we'll download the web page as an html document using the getstore function.
Here's the code:

#!/usr/bin/perl -w
#
use strict;
use LWP::Simple;

#print get("https://www.4shared.com/");

getstore("http://hammersoftware.ca/custom-programming/", "lwptest.html");


This will download the source code of the web page & save it as an html documnet named lwptest.html in the current working directory of the script.

[root@alive ~]# pwd
/root
[root@alive ~]# ls -l web.pl
-rwxr-xr-x. 1 root root 166 Jan 22 03:33 web.pl
[root@alive ~]# ls -l lwptest.html
-rw-r--r--. 1 root root 37789 Jan 22 03:33 lwptest.html
[root@alive ~]# file lwptest.html
lwptest.html: HTML document, UTF-8 Unicode text, with very long lines, with CRLF, LF line terminators
[root@alive ~]#

Along similar lines as the above example, in this final demo we'll download an image from a web site into the current working directory of the script.

#!/usr/bin/perl -w
#
use strict;
use LWP::Simple;

#print get("https://www.4shared.com/");

getstore("http://hammersoftware.ca/wp-content/uploads/2015/03/Perl-logo.jpg", "Logo.jpg");


The execution of the above code results in the download of the image from the said URL with the name perlLogo.jpg.

[root@alive ~]# pwd;ls -l web.pl ;ls -l Logo.jpg ; file Logo.jpg
/root
-rwxr-xr-x. 1 root root 183 Jan 22 03:43 web.pl
-rw-r--r--. 1 root root 50630 Jan 22 03:43 Logo.jpg
Logo.jpg: JPEG image data, JFIF standard 1.01


Just a quick note here. Specifying the name for the web page or the image you are downloading via the getstore function is not optional.
Your code will throw an error if you don't specify a name.

No comments:

Post a Comment

Using capture groups in grep in Linux

Introduction Let me start by saying that this article isn't about capture groups in grep per se. What we are going to do here with gr...