UNIX Web Servers at Queen's University
Last Updated: February 21, 2008
- Web Account Responsibilities
- Web Content
- Server Software
- Indexing and Searching
- Log Files and Access Statistics
- Web Utilities
- Virtual Web Hosting
The web server software on ITS managed departmental and central Solaris UNIX servers on campus is maintained by ITS. The servers run an up to date version of the Apache web server.
This document is intended for administrators of web sites on these ITS managed UNIX servers. It will indicate how these servers are configured and what is available for use on them.
Web account owners should read and be familiar with at least the next two sections. They cover web user responsibilities and content. The remaining sections are more technical in nature and cover how the web server and software are configured.
B) Web Account Responsibilities
This section discusses the responsibilities and expectations of web account owners.
Web accounts must have a continuing faculty or staff member named as the owner of the account. The owner is the main contact point and is responsible for keeping track of some basic information for the account such as:
- A list of staff members or consultants who have access to the account;
- A breakdown of the site structure including any major sections that are controlled by additional web accounts and the contact person responsible for them;
- Documentation on how to run and maintain any web applications contained therein, including any software added to run/deploy the web site; and
- Documentation of all passwords used to access the main web account, and any user/password details specific to web applications, password controlled sections of the site, and MySQL database instance.
The account owner's responsibility extends to the account such that:
- When the password is changed by someone with account access, the web account owner must be informed and given the new password;
- The password must be changed when a person moves on, no longer requires access to the account, the account has been compromised, or a problem is suspected. For ftp only accounts, you must contact ITS to change the password (see next item);
- Password change requests will only be accepted from the responsible web account owner or their supervisor; and
- An additional contact can be designated, although, they must also be a faculty or continuing staff member.
Web accounts are treated as generic accounts as per the Queen's University Account Policy. If the account owner moves on, the department must replace them with a new owner or the account will be suspended.
The web account owner must inform ITS if the web site is no longer needed or you are planning to move the site to a different on-campus server or ISP. A temporary redirect can be enabled to inform users that the site has moved. Stale accounts will be removed to free up resources.
Departments and groups who outsource work or are considering outsourcing work on their web site should ensure that:
- There is a written contract between client and consultant;
- The terms of the contract should be clearly stated, including
- The contract's length or completion date,
- Full outline of detailed expectations and deliverables,
- Knowledge transfer and sign off procedure at contract's completion, and
- A documented maintenance strategy including who is responsible for content integrity, and how to maintain any software components added to deploy the web site.
- The password must be changed at the end of the contract in order to protect the integrity of the site, unless other provisions have been worked out between the parties,
- The consultants must be made aware of all relevant guidelines and policy documents at Queen's in addition to the contract.
Regular maintenance and clean up should be performed on the account. Dated, unused or unneeded web pages and scripts should routinely be removed. Removing these orphaned web pages prevents them from being found through user bookmarks, other web pages or sites that have not been updated, or search engines. Regular cleanup ensures efficient use of your quota controlled disk space on the server. It also reduces clutter, keeps things tidy and easier to find.
ITS recommends that web site administrators create an archival copy of their web site on CD/DVD, tape, or their local disk. This is good for transient accounts such as conferences or events that will be removed but may be needed in the future. Long-term backups are not maintained on ITS managed servers.
The content and activities of web sites and virtual hosted web sites on ITS managed servers must adhere to Queen's policies and guidelines. Please pay particular attention to the following:
- All web pages accessible through Queen's computer networks must comply with University policies and regulations, and with applicable laws, including but not limited to the Queen's University Computer User Code of Ethics, Code of Conduct (opens in new window) and related policies; and
- Direct advertising or promotion of commercial activities which do not directly support the University's scholarly and educational mission are not permitted on web pages made available through Queen's computer networks. Official Queen's web pages representing a department, faculty, unit, programme, or cluster of activities may include recognition of sponsorship or donor support for a particular event, programme, service, product, or facility within the limitations established by University policies and guidelines. While recognition of sponsorships should not include direct advertising, it may include links to the web pages of a sponsoring organization, or institution. For clarification and guidelines, contact Queen's Advancement Office Donor Relations (533-2060).
D) Server Software
The web server software is installed within the qlib userid with a common configuration maintained across servers for consistency. The server's daemon generally runs as the httpd user; this user has no special privileges on the server. For departmental servers, the document root is located within the department's web account so that they have complete control over their web material.
The web server is compiled with the modules that are part of the base source distribution. There are no extra modules such as asp, fastcgi, or mod_perl installed. ITS will not install Microsoft Frontpage extensions due to security concerns.
The main Queen's web server and some other ITS managed servers have PHP and MySQL capabilities. They are available upon request. ITS provides basic PHP (with some core modules) and MySQL installs, built from source code, and will not provide any additional modules. MySQL will be enabled for the main departmental account, and the department will have complete control over their database instance. Any additional departmental and course accounts share the database instance controlled by the main account. The departmental database administrator is responsible for creating the database user accounts and tables for any additional departmental users.
Please note that ITS does not provide support for PHP and MySQL.
The web server software is configured to allow web site administrators the ability to control access to their web content, and to make it difficult to probe the web site and server for possible vulnerabilities. File system access is controlled with options and overrides enabled only for specific directories on the server.
The directives used to control file system access are described here. General directory listings are not enabled; this prevents a web site visitor from reading unintended files if a directory has no index file. The exception is the pub directory within the user's document root which allows the listings of files and directories if there is no index file. The server's document root will follow symbolic links but individual user's document root will only follow symbolic links to files and/or directories owned by that user. Therefore, you cannot link to files and directories you do not own. The Server Side Includes (SSI) directive is enabled with no execute capability. This allows users to use SSI to add some dynamic content to a web page (e.g., like a standard header and/or footer, today's date, and the file's modification date), but not to run or include the output of an executable script.
Server generated error messages do not display the server version to discourage attempts to target (specific) web server vulnerabilities. In addition, some error messages have been customised to provide a generic response and a link back to the server's main page.
The .htaccess file allows users to control access to their web pages. Access can be restricted using IP addresses and/or authentication via a userid and password. Also see the ~qlib/apps/apache/examples; directory on the server for a sample .htaccess file.
Requests for additions to the server's mime.types file can be made through the departmental web administrator.
The server wide cgi-bin directory contains programs ITS makes available to all users; these currently include cgiemail and a simple counter. There will be no additional scripts added to the cgi-bin directory unless they are safe and beneficial to all users, and no access to suid programs.
Cgi-bin access for departmental and course accounts can be enabled through a script aliased directory; this allows the account user to add and manage their own scripts and programs. The departmental account's script alias is /dept-cgi/ and others will be /userid-cgi/ where userid is the account on the server. Copies of the server wide scripts should not be added to any other script aliased directory. Web account administrators using local copies of server wide scripts will be warned to update their web pages to use the server wide script and remove the local copies. If this is not done, ITS will disable these scripts and then remove them. This reduces possible vulnerabilities on the server with software that is not kept current.
ITS will disable any scripts or script aliased directories that are adversely affecting the performance and/or use of the server. Scripts that have security vulnerabilities will also be disabled until fixed or a replacement found. There have been instances where ITS has suspended or removed an account's cgi access. An example occurred in the spring of 2002 when the FormMail.pl script was disabled because it was being used by spammers to relay mail thru the web servers. When a suitable fix was not found for this serious security vulnerability, the FormMail.pl script was removed and users were required to modify their web forms to use cgiemail. Then in the fall of 2003, the cgiemail script was disabled for a few days until a fix was applied because of a similar vulnerability.
Relaying of spam is considered a serious issue because university servers can be put on black lists and cause outgoing email to be rejected. Any software that permits this is not allowed on Queen's servers and computers.
Departmental and course cgi directories that have not been accessed in the last year will be disabled. The web administrator responsible must request to have the directory re-activated. Before the directory is re-activated, the contents of the directory must be cleaned of old, unused or suspect cgi scripts and files.
Web robots or spiders are automated programs that traverse the web and catalogue information they find on web sites. Most robots generate some type of web index which is then used by search engines to help users find information on the internet. Data collected by search engines can be kept for weeks, months, until the next time the robot comes around to the site, or in some cases indefinitely. Therefore, data can be outdated.
A robots.txt file provides a way to request that a robot limit their searching activities on the web site. The server's document root should have a robots.txt file to restrict access to key directories and files that should not be indexed or traversed. Directories and files that contain sensitive or restricted information (e.g., personal information or server statistics) or have content that changes often (e.g. daily news or dynamic web content) should be excluded. The departmental web administrator should ensure that a minimal robots.txt file be added to the server's document root if one is not present. The file should list directories and files that should not be indexed by search engines; for example:
- User-agent: *
- Disallow: /cgi-bin/
- Disallow: /search/
- Disallow: /stats/
- Disallow: /dailynews/
- Disallow: /deptonly/
- Disallow: /private/
- Disallow: /foo.html
A sample robots.txt file can be found in the ~qlib/apps/apache/examples directory on the server. For a more detailed description and examples, see the Standard for Robot Exclusion page.
Web authors can either request their web directory or file be included in the server wide robots.txt file, or use the META tag to control their pages. The Robots META tag allows any user to indicate whether a page should be indexed or if links on the page should be followed. For more information see the Robots META tag home page. Note that only a few robots implemented this tag when it first came out, but this is expected to change.
Everything not explicitly disallowed is considered fair game for a robot to retrieve and traverse. This includes any web pages, directories, and links within the server's document root.
H) Indexing and Searching
The main Queen's search engine found at http://www.queensu.ca/search/ is powered by Google. By default it will regularly index all .queensu.ca web sites. See section G for more details on controlling external search engines.
Departmental, course and virtual hosted web site administrators can setup local search capabilities for their web site or portions of their site using google or other search engines. To add local indexing and searching capabilities to a web site, see the documentation for the search engine you intend to use.
Virtual hosted web sites that are part of the hosting server's search index will be converted to use the site's new URL over a six month period. This is done to reduce the number of different instances of the same web site in the many search indexes on the internet. The directory or link in the hosting server's document root will be removed and added to the robots.txt file, and a redirect will be put in place for the transition period. This allows the site to be reached as before but will cause the site not to be indexed the next time a search engine returns to re-index it. The new virtual hosted web site will be indexed by remote search engines as they come around and any new queensu.ca site will become part or the Queen's wide search index.
The redirect directive allows an old URL to be mapped to a new one. Redirects are enabled through the server's configuration file.
Redirects are only used to transition key web sites and pages that have moved, or to temporarily fix errors in publications. Redirects for web pages that have moved can be put in place for a reasonable period of time, usually no more than 6 months. Redirects for errors in publications will be put in to allow the publisher time to correct the error and will be removed after the publication's next printing or six months (whichever comes first). Departments should verify all URLs in the final copy before sending it to the printer. Users who no longer have an account on the server must make a case for a redirect.
J) Log Files and Access Statistics
The server's AccessLog (i.e., access_log) and ErrorLog (i.e., error_log) files, which are located in the ~qlib/apps/apache/logs directory, are rotated weekly. This occurs every Monday at 12:00am, and the previous three weeks are usually kept (.0, .1 and .2). If traffic on a web site is particularly heavy, the log files may be rotated more often (e.g., the www.queensu.ca server logs are done daily and the previous seven days are kept).
The AccessLog file uses the Combined Logfile Format which logs the access information, plus the agent and referrer information to a single file. The access information logs the pages being accessed, the IP address of the client and the return code indicating whether it was successful or not. The agent information adds the browser version and operating system of the connecting client, and the referrer information the page that it was referenced from. Image files (that is, .gif, .jpg, .jpeg, .ico and .png files) are excluded from the AccessLog file to keep its size down. The ErrorLog file logs any errors, warnings or problems that the server encounters.
Access statistics are kept for the main web site on the server, and any virtually hosted web sites running on it. One year's worth of weekly statistics are available via the http://SERVER-URL/stats/ web page, where SERVER-URL is either DEPT.queensu.ca or VHOST-URL. This web page is restricted to Queen's IP addresses. The statistics are generated by the wwwstat program when the log files are rotated.
Access statistics will be kept on the server for two years and then removed. If you want to keep these for archival or historical purposes you should download the contents of the statistics directory to your local computer.
K) Web Utilities
Several useful utilities are available on the server. Weblint is a syntax and minimal style checker for HTML pages. The wwwstat and splitlog scripts can be run against existing log files to generate access statistics for your web pages. The htpasswd command is used to setup authentication to content via a userid and password.
In order to use these utilities you must be logged into the server. On the main Queen's web server, these are only available to accounts with login access.
*PDF files can be read for free using Adobe Acrobat Reader.