The Frequently Asked Questions are broken down into the following topic areas:
General
I don't understand the terminology used on this site. What do I do?
We have attempted to define all of the terms used on this site in our glossary.
How do I contact the North Carolina State Government Web Site Archives?
Visit our contact page here.
Content
What is included in the North Carolina State Government Web Site Archives?
The collection consists of over 300 domain names that have been identified and appraised for frequent capture. We crawl every link that is part of the originating domain name including images, text and video.
What file types can be stored and accessed?
Typically, if the file can be downloaded from the web without direct user intervention, then it can be stored and accessed. However, we cannot provide access to password protected files, databases, files that require filling out a form for access, and streaming media. To learn more about file types that cannot be archived, click here.
What does it mean when a site's archive data on the search results page has an asterisk (*) next to the date?
Some web pages are not updated very frequently while others are updated often. When our automated systems crawl the web, we find that only about half of all pages on the web have changed from our previous visit. The asterisk indicates that the content has been updated from the previously archived copy. If you don't see an asterisk next to an archived document, then the content on the archived page is probably identical to the previously archived copy (there may be cases where changes are not identified by the automated system).
Is there an easy way to compare two archived versions of a web site to see what has been changed/updated?
Yes. First, you need to be on the search results page that lists the various dates a particular site was archived. Click “compare archive pages” in the top right portion of the screen. The page will reload with check boxes next to each date the site was archived. Choose the two versions you would like to compare and hit the “compare two dates” button (remember that if you don't see an asterisk next to an archived document, then the content on the archived page is probably identical to the previously archived copy). Any deletions will appear in blue with a line through the text and any additions will appear in green.
How do I know the date a site was archived?
When looking at an archived site, the url in the web address box identifies the date and time that the page was captured. For example, a url for a Department of Natural Resources, Division of Water Quality web site, captured October 13, 2005 at 8:18:30pm, reads http://wayback.archive-it.org/194/ 20051013201830/http://h2o.enr.state.nc.us/ The date and time are reflected in the list of numbers in the middle; it translates as YYYYMMDDHHMMSS.
Why does an archived page display today's date?
If a site contains code to calculate the current date, the current date will appear on the site regardless of the date it was actually added to the collection. You should check the url to determine the date the page was archived. The date the page was archived is the list of numbers in the middle; it translates as YYYYMMDDHHMMSS. For example, in this url http://wayback.archive-it.org/194/ 20060208191850/http://www/p2pays.org/ the date the site was crawled was Feb 8, 2006 at 7:18pm and 50 seconds.
Why isn't the site I am looking for in the archive?
The web sites to be included in the collection have all been carefully evaluated according to a set of standards. If you cannot find the site you are looking for in the archive you should first verify that the site was selected for inclusion in the archive. You can review the list of selected sites here. If the site is not on this list, it does not meet the criteria for capture, either because of content or technological impediments to harvesting. It is also possible that the site owner may have requested that the site not be included in the collection, in which case the State Archives may have obtained a copy of the web site directly from the agency without using the automated crawler. Please contact us to determine if this is the case. If you believe that the site should be in the collection, click here for information on how you can recommend the site.
What types of web content cannot be harvested?
As the crawler visits a site it will gather and organize the contents of the web that it encounters – this is known as harvesting. However, there are certain types of content that our crawler cannot harvest. These are:
-
Robots.txt — A robots.txt is something that a site owner puts on their site that keeps crawlers like our own from crawling them. When we encounter a robots.txt we stop harvesting the site.
-
JavaScript — JavaScript elements are often hard to archive and even harder to display, especially if they generate links without having the full name in the page. Plus, if JavaScript needs to contact the originating server in order to work, it will fail when archived. Instead, the user will be sent to the live web if the site contains a lot of Java Script. If the site only contains a small amount of Java script e.g. a counter, the site will display properly but the java code item will not. For example, the counter on the State Fair Web Site on the live web reads, “Only X number of days to the State Fair.” In the Web Site Archives, this counter cannot be displayed, and is replaced by a gray box.
-
Date Displays — If a site contains code to calculate the current date, the current date will appear on the site regardless of the date it was actually added to the collection. You should check the url to determine the date the page was archived.
-
Server Side Image Maps — Like any functionality on the web, if it needs to contact the originating server in order to work, it will fail when archived.
-
Streaming Media — This is a one-way transmission over a data network that is played as it is received and is not stored permanently on the requesting computer. While we can’t harvest streaming media, we can harvest downloadable media files.
-
Password Protected Sites — This includes https sites. The crawler cannot collect any site that requires a password or that is database driven because it requires user input.
-
Form Driven Content — If you need to fill in a form to get access to the content, the crawler typically cannot retrieve this content without user input.
Can I suggest a site for inclusion in the North Carolina State Government Web Site Archives?
Yes. To do so, please contact us with the proposed url at the highest level that should be captured and an explanation of why you believe the site should be included. We will evaluate the site using the Collection Procedures for State Government Web Sites Using Archive-It (pdf). A response will be returned to you within one week. However, because of certain collection constraints we may not be able to add proposed sites immediately.
Access
Who has access to the collection?
The materials from our collection are made available to the public for use in research, teaching, and private study, pursuant to the U.S. Copyright Law. The user must assume full responsibility for any use of the materials, including but not limited to, infringement of copyright and publication rights of reproduced materials. Click here to see our full copyright statement.
How can I find what I am looking for in the collection?
We provide full text search capability for all collections. Alternatively, if you know the site you are looking for, you can enter the url into the search box and view all instances of that archived url. If you are looking for information captured prior to 2006, you may need to search or browse one of our pilot collections.
Can I download sites from the collection?
We do not prohibit downloading from our collection, however, the user must assume full responsibility for any use of the materials, including but not limited to, infringement of copyright and publication rights of reproduced materials. View our full copyright statement here. Whenever materials from our collection are used in a publication or other product we request that the copy carry a credit line stating “Courtesy of the North Carolina State Archives.”
Errors
Why is the page displaying oddly? Sections are moved around or missing.
Most of the sites captured display best using either Mozilla Firefox or Internet Explorer 7, so check the page in both browswers. Download Firefox here and IE7 here. Other display issues result from frames on a Web page; in this case it is just a bug in the Archive. Please note, however, that we are regularly working on resolving these issues.
What does this error message mean?
Below is a list of the main error messages you may encounter while searching the collection. If you see an error message that does not have the Internet Archive Wayback Machine logo in the upper left corner, you are most likely looking at an archived error page or the live web.
Failed Connection — The server that the particular piece of information is stored on is down. Generally these errors clear up within two weeks.
Robots.txt Query Exclusion — A robots.txt is something that a site owner puts on their site that keeps crawlers like our own from crawling them. When we encounter a robots.txt we stop harvesting the site.
Blocked Site Error — Site owners or copyright holders have requested that the site be excluded from the collection. It is possible that the State Archives obtained a copy of the web site you are looking for directly from the agency without using the automated crawler. Please contact us to determine if the web site is available.
Path Index Error — A path index error message refers to a problem in our database where the information requested is not available (generally because of a machine or software issue, however each case can be different). These errors may take time to fix. If you encounter this error message please alert us to the problem by contacting us and identifying the link that you were trying to reach and the page that you were trying to link from.
Not in Archive — The page you are trying to access is not part of the collection. Web sites to be included in the collection have all been carefully evaluated according to a set of standards. If you cannot find the site you are looking for in the archive you should first verify that the site was selected for inclusion in the archive. You can review the list of selected sites here. If the site is not on this list, it does not meet the criteria for capture, either because of content or technological impediments to harvesting. It is also possible that the site owner may have requested that the site not be included in the collection. If you believe that the site should be in the collection, click here. Finally, it is possible that a site that should be part of the collection has a redirect on it and the site you are redirected to is not in the collection. If you believe this to be the case, please contact us.
Why was I thrown out to the live web?
Typically this is a JavaScript problem. You may not be able to view web sites that make extensive use of JavaScript for links. Often the interface is unable to convert the JavaScript allowing it to link to archived pages. Instead, the links go directly to the live web.
Why did I end up on a page captured on a different date when I clicked a link in the archives?
Linked pages are not always archived on the same date. Certain sites are captured more frequently than others, based on the results of a ranking system that evaluates the content contained within that site. If you are following links from one domain to another domain, both in the collection, it is possible the new domain was captured on a different date. In that case, we will display the closest available capture date of the new domain. To make sure you know what version of the web you are looking at pay attention to the date code embedded in the archived url. This is the list of numbers in the middle; it translates as YYYYMMDDHHMMSS. For example, in this url http://wayback.archive-it.org/194/ 20000229123340/http://www.ncgov.com/ the date the site was crawled was Feb 29, 2000 at 12:33pm and 40 seconds.
Why can’t I see the images on a site?
Most images display properly in the archives. When there is a small red "x" where the image should be it means that the images are not available in the archives because technological issues prevented the capture of the image content. When an image is grayed out it means that the site owner used robots.txt exclusions to block access to the images directory.
What if a web site owner does not want their site accessible through the collection?
All web sites in the collection have been carefully selected using the Collection Procedures for State Government Web Sites Using Archive-It (pdf). In general we will not honor requests to remove sites unless they come directly from the site owner. Site owners can request manual exclusions of their web sites by contacting us. However, if site owners choose not to participate through this automated method, they will need to arrange for a copy of the site to be delivered to the State Archives. Please see the Guidelines for Maintaining and Preserving Records of Web-Based Activities for more information.