Storing And Querying Information with PERL-MySQL

Disclaimer

These instructions/steps worked for me and it may very well work for you on Sun OS / linux distributions. Please note that if you decide to use these instructions on your machine (either for this particular application and/or for other kind of information), you are doing so entirely at your very own discretion and that neither this site, sgowtham.com, nor its author is responsible for any/all damage – intellectual or otherwise.


During the Christmas break of 2005-06 (while I was a graduate student in Michigan Technological University), I thought it would be nice to have a tool to store details pertaining to our research group publications in a MySQL database (instead of a flat HTML file). Some underlying motives were as follows:

  1. List all publications in a year – helps when generating annual progress reports
  2. List all publications by any given author(s)
  3. A combination of the above two possibilities

The server which served our research group website used Sun OS 5.6 and for security-related reasons, it did not have PHP. As such, it became necessary (a good thing!) that this utility had to be written using PERL CGI (Common Gateway Interface). List of required features (implemented over a period of time) included:

  1. User Side
    1. Display all publications when no query is submitted (default display) – include a search form
    2. Searchable by year of publication
    3. Searchable by two authors
    4. Searchable by a combination of #2 and #3
    5. When displaying queried results, include Abstract
  2. Admin Side:
    1. Adding a new publication to the database, with PDF upload option (GUI)
    2. Updating details for an entry that already exists in a database (GUI)
    3. Deleting an entry from the database (GUI)
    4. Restricted access

Starting with MySQL database/table structure, these requirements are described one by one in following subsections.

#0. Preliminary Settings

Let us assume that:

  1. The web documents are stored under /var/www/html and will henceforth be referred to as $DocumentRoot (if you are trying to implement this in your research group in a university/academic institution, please check with your systems administrator for the appropriate value of $DocumentRoot).
  2. The page (with search form) which users will interact is called publications.cgi and is stored under $DocumentRoot.
  3. The folder that contains PDF version of publications is called pdf and is located under $DocumentRoot. Make sure this folder has 777 permission.
  4. The folder that contains administrative CGI scripts is called admin and is located under $DocumentRoot. This folder must have restricted access.

#1. MySQL Database

  1. Create a database. If einstein is userid of the group’s principal investigator, then einstein_research would be just fine a name for the database.
  2. Create a dummy user (within MySQL), einstein_webuser, with at least INSERT, UPDATE, SELECT and DELETE previliges on einstein_research database.
  3. Set a (strong enough) password for this dummy user.
  4. Create a table within einstein_research database, publications, with following structure:
    1
    2
    3
    4
    5
    6
    7
    8
    9
    10
    11
    12
    13
    
    CREATE TABLE `publications` (
      `PubID`    INT( 11 ) NOT NULL AUTO_INCREMENT ,
      `BibID`    VARCHAR( 15 ) DEFAULT NULL , 
      `Title`    VARCHAR( 255 ) NOT NULL,
      `Authors`  VARCHAR( 150 ) NOT NULL,
      `JName`    VARCHAR( 30 ) NOT NULL,
      `JVolume`  INT( 8 ) NOT NULL,
      `JYear`    YEAR( 4 ) NOT NULL,
      `JPage`    INT( 11 ) NOT NULL,
      `Abstract` text,
      UNIQUE KEY `BibID` ( `BibID` ) ,
      KEY `PubID` ( `PubID` )
    ) ENGINE = MYISAM DEFAULT CHARSET = latin1;

It is generally a good idea to back up these databases on a regular basis – to save oneself from the pain of re-entering everything if something were to go wrong. You may refer to one of my previous posts that discusses backing up and restoring MySQL databases in detail.

#2. User Side: publications.cgi

Following one of the requirements as mentioned before, this page must display all publications when no query is submitted (default view) and display only those publications (with abstract) when a query is submitted. The part below is the code for default view.

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
#! /usr/bin/perl
 
# publications.cgi (PART #1)
# CGI script to search the database
 
# Modules being used
use DBI;
use CGI;
 
# Make a CGI object and retrieve information from the form
$inputform = new CGI;
print $inputform->header;
 
# Get values from the form
$year    = $inputform->param("year");
$title   = $inputform->param("title");
$author1 = $inputform->param("author1");
$author2 = $inputform->param("author2");
$bo1     = $inputform->param("BO1");
$bo2     = $inputform->param("BO2");
$bo3     = $inputform->param("BO3");
$ffill   = $inputform->param("ffill");
 
# If the variable 'ffill' is empty, then display the form
if ((!($ffill)))
{
print << "EOF";
<!-- www.w3c.org standard -->
 
 
<!-- HTML begins -->
 
 
<!-- HEAD section -->
 
 
<!-- TITLE section -->
 
 
 
 
<!-- BODY section -->
 
 
<!-- Group Header -->

YOUR GROUP NAME

Publications

Year Title
Author Author


** All fields are optional

EOF

# Connect to the database and perform the SQL query
$dbh = DBI->connect(‘dbi:mysql:einstein_research:localhost’,’einstein_web’,’PASSWORD’) or die “Error”;
$sql1 = “SELECT * FROM publications ORDER BY JYear DESC, “;
$sql = $sql1 . “JName ASC, JVolume DESC, JPage DESC, Title”;

$query = $dbh->prepare($sql);
$query->execute;
$nresults = $query->rows;

# Display the results
if ( $nresults > 0 ) {
print << “EOF”;

1
2
3
4
5
6
7
8
9
10
11
12
13
 
EOF
 
$id=1;
  while(($PubID,$BibID,$Title,$Authors,$JName,$JVolume,$JYear,$JPage,$Abstract)=$query-&gt;fetchrow()) {
print &lt;&lt; "EOF";
 
 
EOF
  $id++;
}
 
print &lt;&lt; "EOF";
# BibID / Title / Authors / Citation
(Last-to-first, with Journals alphabetically sorted)
$id $BibID |
PDF
$Title
$Authors
$JName, $JVolume, $JPage ($JYear)

EOF
} else {

print ”

Search Results

\n”;
print “Sorry, no records were found!“;
}

print << “EOF”;

1
2
3
4
5
<!-- HTML ends -->
 
EOF
exit;
}

The part below contains the code that takes care of search-results part.

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
# publications.cgi (PART #2)
 
# If the variable 'ffill' is not empty, then query the database
# and display results
print &lt;&lt; "EOF";
<!-- www.w3c.org standard -->
 
 
<!-- HTML begins -->
 
 
<!-- HEAD section -->
 
 
<!-- TITLE section -->
 
 
 
 
<!-- BODY section -->
 
 
<!-- Group Header -->

YOUR GROUP NAME

Publications

Year Title
Author Author


** All fields are optional

EOF

# If the variable ‘ffill’ is not empty, display the search results
# Connect to the database and perform the SQL query
$dbh = DBI->connect(‘dbi:mysql:einstein_research:localhost’,’einstein_web’,’PASSWORD’) or die “Error”;
$sql1 = “SELECT * FROM publications WHERE JYear LIKE \”$year\” $bo1 “;
$sql2 = $sql1 . “Title LIKE \”%$title%\” $bo2 Authors LIKE \”%$author1%\” $bo3 “;
$sql3 = $sql2 . “Authors LIKE \”%$author2%\” ORDER BY JYear DESC, JName ASC, “;
$sql = $sql3 . “JVolume DESC, JPage DESC, Title”;

$query = $dbh->prepare($sql);
$query->execute;
$nresults = $query->rows;

# Display the results
if ( $nresults > 0 ) {
print << “EOF”;

Search Results

Your query matched $nresults result(s)

1
2
3
4
5
6
7
8
9
10
11
12
13
 
EOF
 
$id=1;
  while(($PubID,$BibID,$Title,$Authors,$JName,$JVolume,$JYear,$JPage,$Abstract)=$query-&gt;fetchrow()) {
print &lt;&lt; "EOF";
 
 
EOF
  $id++;
}
 
print &lt;&lt; "EOF";
# BibID / Title / Authors / Citation / Abstract
(Last-to-first, with Journals alphabetically sorted)
$id $BibID |
PDF
$Title
$Authors
$JName, $JVolume, $JPage ($JYear)

$Abstract

EOF
} else {

print ”

Search Results

\n”;
print “Sorry, no records were found!“;
}

print << “EOF”;

1
2
3
4
5
6
<!-- HTML ends -->
 
EOF
 
# Close database connection
$dbh-&gt;disconnect;

#3. Admin Side: pub_entry.cgi

Guidelines for using this are as follows:

  1. BibID
    1. Cannot be empty
    2. Maximum of 15 characters
    3. Must be UNIQUE
    4. Cannot be changed later
    5. Good way to compose this is by using first letters of authors’ last names along with year
      (i.e., if R. J. Smith, A. S. Young and B. L. Spring published an article in 2006, BibID would be SYS2006.

      In case of single author, use Smith2006)

    6. If you expect to have more than one publication in a year by the same set of authors,
      use SYS2006_0, SYS2006_1, and so on OR Smith2006_0, Smith2006_1, etc.
    7. Required to delete/update an entry, if such a need arises
  2. Title
    1. Cannot be empty
    2. Maximum of 255 characters
    3. Can contain some HTML tags (<b>, <em>, <sup>, <sub>, etc)
    4. Use backslash ( \ ) to escape single-quotes ( ‘ )
  3. Authors
    1. Cannot be empty
    2. Maximum of 150 characters
    3. Separate authors by using ‘,’ (comma)
    4. Do not use ‘and’
  4. Journal :: Name
    1. Cannot be empty
    2. Maximum of 30 characters
    3. Can contain some HTML tags (<sup>, <sub>, etc)
  5. Journal :: Volume
    1. Cannot be empty
    2. Must be an integer
  6. Journal :: Year
    1. Cannot be empty
    2. Must be an integer
    3. Use all four digits to specify (like ‘2006’ instead of ’06’)
  7. Journal :: Starting Page
    1. Cannot be empty
    2. Must be an integer
  8. Abstract
    1. Can be empty
    2. Maximum of 255 characters
    3. Can contain some HTML tags
      (<b>, <em>, <sup>, <sub>, etc)
    4. Use backslash ( \ ) to escape single-quotes ( ‘ )
  9. Journal :: PDF Version
    1. Cannot be empty
    2. PDF document MUST have the SAME name as BibID
    3. MUST be PDF (not just the filename extension)
    4. Process DOES NOT check to make sure it is PDF

The code is split into two parts – default view is to just display a form. When properly submitted, it makes an entry into the database. Code for default view is below:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
#! /usr/bin/perl
 
# pub_entry.cgi (PART #1)
# CGI script to make an entry into publication
 
# Modules being used
use DBI;
use CGI;
use CGI::Carp qw ( fatalsToBrowser );
use File::Basename;
 
# Make a CGI object and retrieve information from the form
$inputform = new CGI;
print $inputform-&gt;header;
 
# Assign form-values to local variables
$bibid    = $inputform-&gt;param("bibid");
$title    = $inputform-&gt;param("title");
$authors  = $inputform-&gt;param("authors");
$jname    = $inputform-&gt;param("jname");
$jvolume  = $inputform-&gt;param("jvolume");
$jyear    = $inputform-&gt;param("jyear");
$jpage    = $inputform-&gt;param("jpage");
$abstract = $inputform-&gt;param("abstract");
$jpdfdoc  = $inputform-&gt;param("jpdfdoc");
$efill    = $inputform-&gt;param("efill");
 
# If variable 'efill' is empty, then display the form
if (!($efill)) {
print &lt;&lt; "EOF";
<!-- www.w3c.org standard -->
 
 
<!-- HTML begins -->
<!-- HEAD section -->
 
 
<!-- TITLE section -->
 
 
 
 
<!-- BODY section -->
<!-- Group Header -->

YOUR GROUP NAME

Publications :: Database Management :: Entry

BibID
Title
Authors
Journal :: Name
Journal :: Volume
Journal :: Year
Journal :: Starting Page
Journal :: Abstract
Journal :: PDF Version

1
<!-- HTML ends -->EOF exit; }

The details regarding uploading a document via PERL & CGI are contained in one of my previous posts. Code for entering the above information into the database follows below:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
# pub_entry.cgi (PART #2)
 
# If variable 'efill' is not empty, then insert the data
# into the database
print &lt;&lt; "EOF";
<!-- www.w3c.org standard -->
 
 
<!-- HTML begins -->
 
 
<!-- HEAD section -->
 
 
<!-- TITLE section -->
 
 
 
 
<!-- BODY section -->
 
 
<!-- Group Header -->

YOUR GROUP NAME

Publications :: Database Management :: Entry


EOF

# Connect to the database and perform the SQL query
$db = DBI->connect(‘dbi:mysql:einstein_research:localhost’,’einstein_web’,’PASSWORD’) or die “Error”;
$sql1 = “INSERT INTO publications VALUES (”, ‘$bibid’, ‘$title’, “;
$sql2 = $sql1 . “‘$authors’, ‘$jname’, ‘$jvolume’, ‘$jyear’, ‘$jpage’, “;
$sql = $sql2 . “‘$abstract’)”;
$query = $db->prepare($sql);
$query->execute;

# PDF upload part
# 5MB limit on the size of the document
$CGI::POST_MAX = 1024 * 1024 * 5;

# Characters that are allowed to be part of the filename
my $safe_characters = “a-zA-Z0-9_.-“;

# Update location
my $upload_location = “/var/www/html/pdf”;

my $filename = $inputform->param(“jpdfdoc”);
if ( !$filename ) {
print $inputform->header ( );
print << “EOF”;

There was a problem uploading the PDF

EOF
exit;
}

# Clean up the filename – remove any ‘path’ and split the filename into
# basename and extension
my ( $name, $path, $extension ) = fileparse ( $filename, ‘\..*’ );
$filename = $name . $extension;

# Replace blank space in filename with ‘underscore’
$filename =~ tr/ /_/;

# Remove any ‘not safe’ characters
$filename =~ s/[^$safe_characters]//g;

# Untaint the filename
if ( $filename =~ /^([$safe_characters]+)$/ ) {
$filename = $1;
} else {
die “Filename contains invalid characters”;
}

my $upload_filehandle = $inputform->upload(“jpdfdoc”);

open ( UPLOADFILE, “>$upload_location/$filename” ) or die “$!”;
binmode UPLOADFILE;

while ( <$upload_filehandle> ) {
print UPLOADFILE;
}

close UPLOADFILE;

print << “EOF”;

Data was successfully entered into the database. Click
here to make another entry.

1
2
3
4
5
6
<!-- HTML ends -->
 
EOF
 
# Close Database Connection
$db-&gt;disconnect;

#4. Admin Side: pub_update.cgi and pub_updated.cgi

Often times, it becomes necessary to update an entry (author names, citation, abstract, etc.) and to do so, the appropriate entry must be retrieved from the database. Code for pub_update.cgi follows:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
#! /usr/bin/perl
 
# pub_update.cgi
# CGI script to retrieve a publication from the database
# for update
 
# Modules being used
use DBI;
use CGI;
 
# Make a CGI object and retrieve information from the form
$inputform = new CGI;
print $inputform-&gt;header;
 
# Assign form-values to local variables
$bibid = $inputform-&gt;param("bibid");
$rfill = $inputform-&gt;param("rfill");
 
# If variable 'rfill' is empty, then display the form
if (!($rfill)) {
print &lt;&lt; "EOF";
<!-- www.w3c.org standard -->
 
 
<!-- HTML begins -->
 
 
<!-- HEAD section -->
 
 
<!-- TITLE section -->
 
 
 
 
<!-- BODY section -->
 
 
<!-- Group Header -->

YOUR GROUP NAME

Publications :: Database Management :: Retrieve


BibID




1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
<!-- HTML ends -->
 
EOF
exit;
}
 
 
# If variable 'rfill' is not empty, retrieve data from the database
print &lt;&lt; "EOF";
<!-- www.w3c.org standard -->
 
 
<!-- HTML begins -->
<!-- HEAD section -->
 
 
<!-- TITLE section -->
 
 
 
 
<!-- BODY section -->
<!-- Group Header -->

YOUR GROUP NAME

Publications :: Database Management :: Retrieve


EOF

# Connect to the database and perform the SQL query
$db = DBI->connect(‘dbi:mysql:einstein_research:localhost’,’einstein_web’,’PASSWORD’) or die “Error”;
$sql = “SELECT * FROM publications WHERE BibID=’$bibid’ LIMIT 1”;
$query = $db->prepare($sql) or die “Error”;
$query->execute;

while(($PubID,$BibID,$Title,$Authors,$JName,$JVolume,$JYear,$JPage,$Abstract)=$query->fetchrow()) {
print << “EOF”;

BibID $BibID
Title
Authors
Journal :: Name
Journal :: Volume
Journal :: Year
Journal :: Starting Page
Journal :: Abstract

$Abstract

EOF } print << “EOF”;

1
<!-- HTML ends -->EOF # Close Database Connection $db-&gt;disconnect;

pub_update.cgi takes all the information and passes it over to pub_updated.cgi, which updates the record. Code for pub_updated.cgi follows below:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
#! /usr/bin/perl
 
# pub_updated.cgi
# CGI script to update an entry in the database
 
# Modules being used
use DBI;
use CGI;
 
# Make a CGI object and retrieve information from the form
$inputform = new CGI;
print $inputform-&gt;header;
 
# Assign form-values to local variables
$bibid    = $inputform-&gt;param("bibid");
$title    = $inputform-&gt;param("title");
$authors  = $inputform-&gt;param("authors");
$jname    = $inputform-&gt;param("jname");
$jvolume  = $inputform-&gt;param("jvolume");
$jyear    = $inputform-&gt;param("jyear");
$jpage    = $inputform-&gt;param("jpage");
$abstract = $inputform-&gt;param("abstract");
 
print &lt;&lt; "EOF";
<!-- www.w3c.org standard -->
 
 
<!-- HTML begins -->
 
 
<!-- HEAD section -->
 
 
<!-- TITLE section -->
 
 
 
 
<!-- BODY section -->
 
 
<!-- Group Header -->

YOUR GROUP NAME

Publications :: Database Management :: Retrieve


EOF

# Connect to the database and perform the SQL query
$db = DBI->connect(‘dbi:mysql:einstein_research:localhost’,’einstein_web’,’PASSWORD’) or die “Error”;
$sql1 = “UPDATE publications SET Title=’$title’,Authors=’$authors’,”;
$sql2 = $sql1 . “JName=’$jname’,JVolume=’$jvolume’,JYear=’$jyear’,”;
$sql = $sql2 . “JPage=’$jpage’,Abstract=’$abstract’ WHERE BibID=’$bibid’ “;

$query = $db->prepare($sql);
$query->execute;

print << “EOF”;

Data was successfully updated in the database. Click
here to update another entry.

1
2
3
4
5
6
<!-- HTML ends -->
 
EOF
 
# Close Database Connection
$db-&gt;disconnect;

#5. Admin Side: pub_delete.cgi

Seldom it becomes necessary to delete an entry from the database (a reference article entered in by mistake as a publication, etc.) and the code for pub_delete.cgi follows:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
#! /usr/bin/perl
 
# pub_delete.cgi
# CGI script to delete an entry from the database
 
# Modules being used
use DBI;
use CGI;
 
# Make a CGI object and retrieve information from the form
$inputform = new CGI;
print $inputform-&gt;header;
 
# Assign form-values to local variables
$bibid = $inputform-&gt;param("bibid");
$dfill = $inputform-&gt;param("dfill");
 
# If variable 'dfill' is empty, then display the form
if (!($dfill)) {
print &lt;&lt; "EOF";
<!-- www.w3c.org standard -->
 
 
<!-- HTML begins -->
 
 
<!-- HEAD section -->
 
 
<!-- TITLE section -->
 
 
 
 
<!-- BODY section -->
 
 
<!-- Group Header -->

YOUR GROUP NAME

Publications :: Database Management :: Delete


BibID




1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
<!-- HTML ends -->
 
EOF
exit;
}
 
# If variable 'dfill' is not empty, then delete an entry
# from the database
print &lt;&lt; "EOF";
<!-- www.w3c.org standard -->
 
 
<!-- HTML begins -->
 
 
<!-- HEAD section -->
 
 
<!-- TITLE section -->
 
 
 
 
<!-- BODY section -->
 
 
<!-- Group Header -->

YOUR GROUP NAME

Publications :: Database Management :: Delete


EOF

# Connect to the database and perform the SQL query
$db = DBI->connect(‘dbi:mysql:einstein_research:localhost’,’einstein_web’,’PASSWORD’) or die “Error”;
$sql = “DELETE FROM publications WHERE BibID=’$bibid’ LIMIT 1”;
$query = $db->prepare($sql) or die “Error”;
$query->execute;

print << “EOF”;

Entry was successfully deleted from the database. Click
here to delete another entry.

1
2
3
4
5
6
<!-- HTML ends -->
 
EOF
 
# Close Database Connection
$db-&gt;disconnect;

#6. Admin Side: Restricting Access To Admin Interface

There are some ways of achieving this – to prevent unauthorized personnel from messing with databases and files.

  1. Password protect the admin folder – allow only a certain user from any machine
  2. IP address/Hostname based restriction – allow any user (preferably a group member) to access the admin interface from certain machine(s).
  3. Combination of #1 & #2 – allow only certain users from certain machines.

A simple Google!ing for password protection .htaccess .htpasswd should result in a good number of decent documents. But do check/discuss your approach with your systems administrator(s) – to make sure that it can be applied, is feasible and less mis-usable under existing circumstances.

Screenshots

The screenshots below were generated after incorporating the style sheets (CSS) that are in-line with Michigan Tech theme. The code in the above sections, however, results in very plain, bare-bone HTML pages. With a basic, working knowledge of CSS, it should not be too difficult to spice up the appearance:


Storing And Querying Information with PERL-MySQL

Publications – Default View


Storing And Querying Information with PERL-MySQL

Publications – When a query is performed


Storing And Querying Information with PERL-MySQL

Making a new entry


Storing And Querying Information with PERL-MySQL

Deleting an entry


Storing And Querying Information with PERL-MySQL

Retrieval for update


Storing And Querying Information with PERL-MySQL

Updating an entry

Demonstration

Live demo of this code, only the publications.cgi, is here. Others, as can be expected, are hidden in a secure folder.

One Reply to “Storing And Querying Information with PERL-MySQL”

Leave a Reply

Your email address will not be published. Required fields are marked *

This site uses Akismet to reduce spam. Learn how your comment data is processed.