Frequently Asked Questions

Basics
  1. What is a codebook?
  2. Aren't data and statistics pretty much the same thing?
  3. I would like to use a paper codebook instead of an Adobe Acrobat (pdf) version. Does the library have paper codebooks available for checkout?
  4. Are there any costs associated with getting and using raw data?
Technical/Computer related
  5. Trouble with authentication
  6. How do I unzip a file I've downloaded?
  7. What do I do if my computer does not have sufficient disk space to work with large datasets?
  8. What is Bengal?
  9. What is FERRET?
Getting the Best Dataset
  10. What's the best way to go about searching for datasets?
  11. How does ICPSR compare with the Roper Center as a source for data?
  12. Will MU Libraries pay the fees for a dataset I need, even it it's not within ICPSR or Roper's archives?
Training
  13. How can I find out more about SAS and SPSS?
  14. I've heard that there are training opportunities in statistical methods. How can I find out more about this?
  15. Where can I find data websites that offer online analysis?
 

The Basics

Q. What is a codebook?

A. A codebook is the text document that accompanies a dataset and explains the coding system -- that is, the way in which information was assigned digital equivalents in order to be readable by statistical software programs. Codebooks explain each variable in the dataset. If the original instrument is a public opinion survey, the codebook will often provide the exact wording of each question (sometimes even a facsimile version of the questionnaire), indicate the names of each variable, the column or relative position of variables within the dataset, and indicate the numeric coding for each type of response: for example, 01 for strongly agree, 02 for slightly agree, 03 for slightly disagree, etc.

The codebook also provides information on the original data collection project -- the purpose of the original researchers, an explanation of methodology, notes on sample size, field dates, information on how to cite the study, and more.

For further information, see ICPSR's explanation of codebook files.

Q. Aren't data and statistics pretty much the same thing?

A. Though these terms are sometimes used interchangably in casual conversation, they have distinct meanings in a research context.

In this context, data is the term used to refer to digital files containing coded information. In data files, digits can represent categorical variables (i. e. 1=male, 2=female, 3=don't know), or they might indicate actual quantities. Either way, datasets must be accompanied by a codebook or something that explains the meaning of numbers in certain positions. With the codebook as a tool, a researcher can conduct quantitative analysis and ultimately generate meaningful statistics from the data.

If you would like to read more about data and statistics, see ICPSR's Data Use Tutorial.


Q. I would like to use a paper codebook instead of an Adobe Acrobat (pdf) version. Does the library have paper codebooks available for checkout?

A. We have paper codebooks for the popular General Social Survey and for the American National Election Studies. They are shelved in the Reference stacks of Ellis Library. We used to collect paper codebooks before ICPSR and Roper started making them available online. The old paper codebooks which we do have are cataloged in MERLIN and housed in a remote storage facility. They may be checked out.

Q. Are there any costs associated with getting and using raw data?

A. For most users, no. MU Libraries have already paid membership fees for ICPSR and the Roper Center, so MU users can freely download what they find in their archives. In addition, there are a growing number of free, public use datasets on government and non-profit websites. Sometimes researchers find datasets that are available for sale by a commercial organization. Occasionally the libraries are asked if we might pay the costs to acquire them. This question is answered in Q.12, below.

After you have selected a dataset and have begun work on your project, you might discover a need for extra software training or statistical help. The services offered by MU's Social Sciences Statistics Center (SSSC) are free to graduate students. Faculty working on major research projects may be asked to pay a consultation fee.


  

Technical/Computer related

Q. Trouble with authentication:
I get my internet service from a commercial provider, and the ICPSR website isn't recognizing me as an MU affiliate. Is there any way I can authenticate myself so I can download datasets?

A. MU's DoIT offers a Virtual Private Networking service (VPN) that allows you to configure your home computer so that it will appear within MU's IP address range, regardless of your ISP. After you set up VPN on your machine, you should be able to download from ICPSR and Roper without a hitch.

Q. How do I unzip a file I've downloaded?

A. DoIT provides help desk support for the compression utility Power Archiver. The software is available to MU affiliates for free download from DoIT. If you have trouble using Power Archiver, call 882-5000 for assistance. If you have a compressed file which Power Archiver cannot unzip your files, try 7-Zip.

Q. What if my computer does not have sufficient disk space to work with large datasets?

A. Back in the 1980s and 1990s when personal computers had limited hard drive space, few people could use desktop machines to work with large datasets. Now that computers have hard drive space measured in gigabytes, and have rewritable CD or zip drives that can hold 700 or 750 megabytes of data, fewer people are running into this problem. Still, MU Libraries' Data Services is ready to help anyone short on disk drive space by placing datasets on Bengal. There is an additional advantage to having data on Bengal, and that is that it's possible to access the free UNIX version of SAS software from there.

Q. What is Bengal?

A. Bengal is MU's main computer server. It provides each MU student, faculty and staffperson 150 megabytes of computer storage space for use in statistical analysis, programming, web development, or other purposes.

Q. What is FERRET?

A. FERRET is an acronym for Federal Electronic Research & Review Extraction Tool. It allows users to extract datasets of government information via the web, get documentation on variables, and run cross-tabulations. FERRET is one of a growing number of websites that allow users to conduct simple analysis online. Many researchers use a full fledged statistical software program like SAS or SPSS to further analyze the datasets they download with FERRET.


Getting the Best Dataset

Q. What's the best way to go about searching for datasets?

A. Begin by checking the ICPSR and/or Roper web databases. Directions on search strategy for each are available on our ICPSR and Roper pages. If you are unable to find exactly what you need through those two organizations' online catalogs, you can check the web for more datasets. We know of several web portals that will allow you to search for datasets on the web.

Q. How does ICPSR compare with the Roper Center as a source for data?

  • Content. The majority of studies in ICPSR fall within the fields of sociology, political science, economics, criminal justice, business, demography, psychology and education. It also has datasets of interest to health researchers and epidemiologists. Data in ICPSR can have been gathered by any means (surveys, government forms, financial reports and historic newspapers, etc.) The Roper Center archive on the other hand, holds only data collected through public opinion surveys. Data in both archives is observational only; data gathered through experiments is not represented in either archive.
  • Ease of searchability. The Roper Center catalog has a somewhat primitive search interface. Fortunately, some of the codebooks in Roper Center's archive may be searched by keyword through through the more sophisticated iPOLL, which is linked from our library's database list. ICPSR's advanced search interface is powerful and the Subject Thesaurus is also very useful. These can be supplemented with the Social Science Variables Database for searching selected ICPSR datasets by keyword at the variable level.

Q. Will MU Libraries pay the fees for a dataset I need if it's not within ICPSR or Roper's archives?

A. We would first consider whether the dataset may be added to the library's holdings as a tangible item (for example, a CD-ROM), and if we would be free to lend it out to any MU users who wish to have it. If the purchase agreement allows only a one-time download and the license is written to allow only a single user (or single research team) use of the data for one research project, then the library would be unable to treat it as a library item and we would have to decline the request. If this is the situation, we suggest you to seek grant funding to help finance the purchase.

On the other hand, if the data organization operates by providing unlimited data access for a yearly fee, we would consider the level of demand for the data across several departments. If only one department has interest, we might suggest the department purchase it, or work in conjuction with their subject librarian who might cancel journal subscriptions in order to help pay for the dataset subscription. If the library were to assume responsibility for the data archive membership on behalf of several campus departments, we would do so only after careful consideration. Dataset license agreements might require us to ask users to cease their research projects midstream should the time come that budget cuts force us to cancel the data archive subscription. Therefore we would consider the addition of a new data archive membership a long-term commitment, and before signing the dotted line, we would have to determine whether our budget could support it not only for the current year but also for years to come.


Training

Q. How can I find out more about SAS and SPSS?

A. SAS and SPSS are statistical software programs supported by DoIT on the MU campus. The software is available on all machines in campus computing sites and in the "OASys" set of computers inside Ellis Library. Click here to learn more.

If you want to be able to use SAS or SPSS on your personal computer, you have two options. You can use the free UNIX version of SAS on Bengal or the UNIX version of SPSS on Bengalstats. Or you can buy SAS or buy SPSS in Windows or Mac versions through DoIT' online order form. New users generally prefer the Windows or Mac interface over the UNIX version.

Q. I've heard that there are training opportunities in statistical methods. How can I find out more?

A. The easiest way to get training is to attend any of DoIT's series of half-day classes in SAS and SPSS. There is no charge for these training sessions. For those who want in-depth training on particular statistical topics, ICPSR's Summer Program in Quantitative Methods offers a wide variety of classes, including 2-5 day workshops and more intensive 4 or 8 week classes. Students from MU can take advantage of member rates on tuition, and there are stipends available which could cover some or all expenses. Contact MU's Data Services Librarian for details.

Q. Where can I find data websites that offer online analysis?

A. ICPSR is now offering online analysis for over 200 datasets using the DAS (Data Analysis System). It does not require users to download any files, and it is useful for those who do not have statistical software installed on their personal computers. DAS does allow users to create and download subsets of the data files if they so choose.

Many free data websites also offer online analysis tools. Examples include the Missouri Census Data Center's Uexplore, the University of Chicago's General Social Survey website, the American Religion Data Archive, the National Household Travel Survey and the U.S. Census Bureau's FERRET.