International Activities Program - Frequently Asked Questions

International Activities Program

Frequently Asked Questions

[Show All] International Assessments

U.S. Participation
- What international assessments does the United States participate in, and what do they measure?
  - The United States participates in the following international assessments:
    
    PIRLS - Progress in International Reading Literacy Study
    PIRLS is an international comparative study of the reading literacy of young students. PIRLS collects data on the reading achievement, experiences, and attitudes of fourth-grade students in the United States and students in the equivalent of fourth grade in other participating countries, as well as information on students’ classroom and school contexts. PIRLS is organized by the International Association for the Evaluation of Educational Achievement (IEA). PIRLS was first administered in 2001 and is administered every 5 years.
    
    PISA — Program for International Student Assessment
    PISA is an international comparative study of the reading literacy, mathematics literacy, and science literacy of 15-year-old students. In addition to an assessment of student literacy, PISA collects information on students’ experiences and attitudes, as well school contexts. PISA is organized by the Organization for Economic Cooperation and Development (OECD), an intergovernmental organization of 31 member countries. Non-OECD-member countries participate as well. PISA was first administered in 2000 and is administered every 3 years.
    
    TIMSS — Trends in International Mathematics and Science Study
    TIMSS is an international comparative study of student performance in mathematics and science at fourth and eighth grades. TIMSS collects data on student achievement, experiences, and attitudes of fourth-grade and eighth-grade students in the United States and students in the equivalent grades in other participating countries, as well as information on classroom and school contexts. TIMSS is organized by the International Association for the Evaluation of Educational Achievement (IEA). TIMSS was first administered in 1995 and is administered every 4 years.
    
    PIAAC — Program for the International Assessment of Adult Competencies
    PIAAC is an international comparative study of adult literacy, including reading literacy, numeracy, problem-solving in a technology-rich environment, and component reading literacy skills, as well as the skills adults report using in their jobs. In addition to the assessment of adult literacy, PIAAC collects data on adults’ educational and work experiences. PIAAC is organized by the Organization for Economic Cooperation and Development (OECD), an intergovernmental organization of 31 member countries. Non-OECD-member countries participate as well. PIAAC will first be administered in 2011 and is expected to be administered every 10 years.
- Why does the United States participate in international assessments?
  - The United States participates in international assessments primarily for two reasons:
    - To learn about the performance of U.S. students and adults in comparison to their peers in other countries.
    - To learn about the educational and work experiences of students and adults in other countries.
    Student assessments are a common feature of school systems that are concerned about accountability and assuring students' progress throughout their educational careers. National or state assessments enable us to know how well students are doing in a variety of subjects and at different ages and grade levels compared to other students nationally or within their own state. International assessments, on the other hand, offer a unique opportunity to benchmark our students' performance to the performance of students in other countries. Similarly, international assessments of adult literacy enable us to compare U.S. adults with their international peers on literacy skills that support productive adult lives in the workplace and society.
    
    International assessments of students also enable countries to learn from each other about the variety of approaches to schooling and to identify promising practices and policies to consider in their schools. International assessments of adults enable research on the correlates between adults’ work and educational experiences and their skill levels within countries and crossnationally.
Development and Administration
- How are test questions developed for the international assessments?
  - There are three main components in the development of test questions:
    1. Test questions for each assessment are first developed through a collaborative, international process.
      For each study, an international subject area expert group is convened by the organization conducting the assessment. This expert group drafts an assessment framework (the outline of the topics and skills that should be assessed in a particular subject area), which reflects a multinational consensus on the assessment of a subject area. Based on the framework, national representatives and subject matter specialists develop the test questions or items. The national representatives from each country then review every test item to ensure that each item adheres to the internationally agreed-upon framework. While not every item may be equally familiar to all students, if any item is considered inappropriate for a participating country or an identified subgroup within a country, that item is eliminated.
    2. Test items are field-tested before they are used or administered in the full-scale assessment.
      Before the administration of the assessment, a field test is conducted in the participating countries. An expert panel convenes after the field test to review the results and look at the items to see if any results were biased due to national, social or cultural differences. If such items exist, they are not included in the full assessment. Only after this thorough process, in which every participating country is involved, are the actual items administered to the students, or adults in the case of PIAAC.
    3. There is an extensive translation verification process.
      All participating countries are responsible for translating the assessment into their own language or languages, unless the original test items are in the language of the country. All countries identify translators to translate the source versions into their own language. External translation companies independently review each country's translations. Instruments are verified twice, once before the field test and again before the main data collection. Statistical analyses of the item data are then conducted to check for evidence of differences in student performance across countries that could indicate a translation problem. If a translation problem with an item is discovered in the field test, it is removed for the full assessment. Since for TIMSS, PIRLS, PISA and PIAAC the items are provided to countries in English, the United States does not need to translate the assessments but does adapt British-English to U.S.-English when necessary and may adapt elements of the assessment, as appropriate.
    For more details about test development, see http://nces.ed.gov/programs/coe/2009/analysis/appa3.asp.
- Who takes the international assessments?
  - A representative national sample of students at the target age or in the target grade in school in each participating country takes each assessment. In the case of PIAAC, the sample is drawn to be representative of persons, 16 to 65 years old, and living in households.
    
    The international organization that conducts each study verifies that all participating countries select a nationally representative sample of students or adults. To ensure comparability, target grades or ages are clearly defined. For example, in TIMSS at the upper grade level, countries are required to sample students in the grade that corresponds to the end of 8 years of formal schooling, providing that the mean age of the students at the time of testing is at least 13.5 years.
    
    Not all selected respondents choose to participate in the assessment; and certain respondents, such as some with cognitive or physical disabilities, may not be able to take the assessment. Thus the sponsoring international organizations check each country's participation rates and exclusion rates to ensure they meet established target rates in order for the country's results to be reported.
    
    For more details about international requirements for sampling and response rates, see http://nces.ed.gov/programs/coe/2009/analysis/appa2.asp.
- How can we be sure that countries administer the test in the same way?
  - The short answer is that procedures for the administration of the international assessments are standardized and independently verified.
    
    The international organizations that conduct international assessments require compliance with standardized procedures. Manuals are provided to each country that specify the standardized procedures that all countries must follow on all aspects of assessment sampling, preparation, administration, and scoring. To further ensure standardization, independent international quality control monitors visit a sample of schools (or households in the case of PIAAC) in each country. In addition, the countries themselves organize their own quality control monitors to visit an additional number of schools (or households in the case of PIAAC). Results for countries that fail to meet the international requirements are footnoted with explanations of the specific failures (e.g., "only met guidelines for sample participation rates after substitute schools were included"), are shown separately in the international reports (e.g., listed in a separate section at the bottom of a table), or are omitted from the international reports and datasets (as happened to the Netherlands' PISA results in 2000, the United Kingdom's PISA results in 2003, and Morocco's TIMSS 2007 results at grade 8).
- Are schools and students required to participate in these assessments?
  - To our knowledge, no countries require all schools and students to participate in PIRLS, PISA, or TIMSS. However, some countries give more prominence to these assessments than do others. In the United States, participation by schools, teachers, and students in international assessments is voluntary.
Issues of Validity and Reliability
- How different are assessment test questions from what students are expected to learn in the classroom?
  - The answer varies from study to study. Some studies, like TIMSS, are curriculum-based and are designed to assess what students have been taught in school using multiple-choice and open-ended (or short answer) test questions. Other studies, like PISA and PIAAC, are “literacy” assessments, designed to measure performance in certain skill areas at a broader level than the school curriculum.
- How do international assessments deal with the fact that education systems around the world are so different?
  - The fact that education systems are different across countries is one of the main reasons we are interested in making cross-country comparisons. However, these differences make it essential to carefully define the target populations to be compared, so that comparisons are as fair and valid as possible. Depending in large part on when students first start school, students at a given age may have less or more schooling in different countries, and, students in a given grade may be of different ages in different countries. In every case, detailed information on the comparability of the sampled populations is published for review and consideration.
    
    For PIRLS, the target population represents students in the grade that corresponds to 4 years of formal schooling, counting from the first year of schooling as defined by the International Standard Classification of Education (ISCED), Level 1. This corresponds to fourth grade in most countries, including the United States. This population represents an important stage in reading development.
    
    In TIMSS, the two target populations are defined as follows: (1) all students enrolled in the grade that corresponds to 4 years of formal schooling—fourth grade in most countries-—providing that the mean age at the time of testing is at least 9.5 years, and (2) all students enrolled in the grade that corresponds to 8 years of formal schooling—eighth grade in most countries—providing that the mean age at the time of testing is at least 13.5 years. At grade four in 2007, only England, Scotland, and New Zealand included students who had 5 years of formal schooling at the time of testing. At grade eight, England, Malta, Scotland, and Bosnia and Herzegovina included students who had 9 years of formal school at the time of testing. In addition, at grade eight, the Russian Federation and Slovenia included some students who had less than 8 years of formal schooling. However, in all of these cases, the assessed students were of comparable average age to those participating in other countries.
    
    Another approach, used in PISA, is to designate a target population as students of a particular age (15 years in PISA), regardless of grade. Both approaches are suited to addressing the particular research questions posed by the assessments. The focus of TIMSS and PIRLS is on content as commonly expected to be taught in classrooms, while PISA emphasizes the skills and knowledge that students have acquired throughout their education both in and out of school.
- Do international assessments take into account that student and adult populations vary in participating countries—for example, the United States has higher percentages of immigrant students and adults than some other countries?
  - Each country has different population characteristics, but the point of international assessments is to measure as accurately as possible the levels of achievement or proficiency of each participating country’s target population. Differences in the levels of achievement or proficiency among students or adults in different countries may be associated with high variations in respondent characteristics, but they may also be due in part to differences in curriculum, teacher preparation, and other educational or societal factors.
- What if countries select only their best students to participate? Won't they look better than the rest?
  - Countries cannot independently select the students who will take the test. Students are sampled, but the sampling of schools and students is carefully planned and monitored by the sponsoring international organizations.
    
    Sampling within countries proceeds as follows:
    A sample of schools in each country is selected randomly from lists of all schools in the country that have students in the particular grade or of the particular age to be assessed. Samples for each country are verified by an international sampling referee. Once the sample of schools is selected, each country must contact these original schools to solicit participation in the assessment. Countries are not allowed to switch schools from the list; doing so can result in the exclusion of their data from the reports.
    
    Every study establishes response rate targets of selected schools (and students) that countries must meet in order to have their data reported. If the response rate target is not met, countries may be able to assess students from substitute schools following international guidelines. For example, PIRLS and TIMSS guidelines specify that substitute schools be identified at the time that the original sample was selected by assigning the two schools neighboring the sampled school on the sampling frame as substitutes. If the original school declines to participate, the first of two substitute schools is contacted to participate. If it declines, the second substitute school is contacted. If it also declines, no other substitute school may be used. If one of the two substitute schools accepts, there are still several constraints on their participation in order to prevent bias. If participation levels, even using substitute schools, still fall short of international or national guidelines, a special non-response bias analysis is conducted to determine if the schools that did not participate differ systematically from the schools that did participate. If the analysis does not show evidence of bias, then the data for a country may still be included in the reporting of results for the international assessment but the problem of participation rates is noted.
    
    Once a sample of schools agrees to participate, the schools are asked to provide a list of all students of the target age or a list of a particular kind of class (for example, all grade 4 classrooms) within the school. From those lists, a group or whole class of students is then randomly selected for the assessment. No substitutions for the students randomly selected are allowed. However, some individual students may be excluded. Each study establishes a set of guidelines for excluding individual students from assessment. Typically, if a student has a verifiable cognitive or physical disability, he or she can be excluded from assessment. However, all student exclusions (at the school level and within schools) cannot exceed established levels, and are reported in international publications. For example, the sampling standards used in PISA permit countries to exclude up to a total of 5 percent of the relevant population for approved reasons. In the United States, the overall exclusion rate in PISA 2006 was 4.28 percent.
    
    Exclusions can take place at the school level (e.g., excluding very small schools or those in remote regions) and the student level. Students can be excluded if they are functionally disabled, intellectually disabled, or have insufficient language proficiency. This determination is made on the basis of information from the school, although the contractors implementing the study also look out for ineligible students who may make it through the screening process. Students cannot be excluded solely because of low proficiency or normal discipline problems.
Reported Results
- Are scores of individual students or adults reported or available for analysis?
  - No. The assessment methods used in international assessments only produce valid scores for groups, not individuals.
- Can you use the international data to report scores for states?
  - No. The U.S. data are typically representative of the nation as a whole but not of individual states. Drawing a sample that is representative of all 50 individual states would require a much larger sample than the United States currently draws for international assessments, requiring considerable amounts of additional time and money.
    
    A state may elect to participate in an international assessment as an individual jurisdiction, in which case a sample is drawn that is representative of that state. To date, no states have participated in PIRLS, PISAor PIAAC as individual jurisdictions. However, several states have participated in TIMSS that way, most recently Massachusetts and Minnesota. These two states independently funded their participation.
- Can you compare scores from one study to another?
  - Scores can be compared from one round of an assessment to another round of the same assessment (e.g., TIMSS 1999 to TIMSS 2007), but they typically cannot be directly compared from one study to another (e.g., TIMSS to PISA or NAEP) without special studies to link the different assessments.
- Can you compare scores between grades—for example, between grade 4 and grade 8 scores on TIMSS?
  - No. The assessments for each grade are scaled separately, so the scores cannot be directly compared in a meaningful way. Only scores from different rounds of the same assessment (e.g., 2003 TIMSS grade 4 and 2007 TIMSS grade 4) can be compared.
- Why does the United States report different findings for the same subjects from different international assessments?
  - At times, different assessments report different findings for the same subject. One obvious factor to consider when examining findings across assessments is that the grade or age levels of the students assessed may differ. Another factor is that studies also differ in the specific subject matter or skills emphasized, (e.g., reading, mathematics, science). An additional difference between assessments that can affect findings in terms of the U.S. position relative to other countries is the groups of countries involved in a study. The United States may appear to perform better or worse depending on the number and competitiveness of the other participating countries.
- Why don't TIMSS, PISA, and PIRLS report differences between U.S. students and other countries' students based on race/ethnicity?
  - There are certain demographic characteristics that are not meaningful across countries. Race/ethnicity is one of these. In the United States, race and ethnicity are highly correlated with education and socio-econimic status, which makes them meaningful categories for analysis. While that is also true in other countries, the racial and ethnic categories used to classify people vary from country to country.
About PIAAC (the Program for the International Assessment of Adult Competencies)
- What is assessed in PIAAC?
  - PIAAC is designed to assess adults over a broad range of abilities—from simple reading to complex problem solving skills. All countries that participate in PIAAC will assess the domains of literacy and numeracy in both a paper-and-pencil mode and a computer-administered mode. In addition, countries may assess problem solving administered on a computer as well as components of reading (administered only in paper-and-pencil format). The U.S. will assess all four domains.
- How does PIAAC select a representative sample of adults?
  - Countries that participate in PIAAC must draw a sample of individuals ages 16 to 65 that represent the entire population of adults living in households in the country. Some countries draw their samples from their national registries of all persons in the country; others draw their samples from census data. In the United States, a nationally representative household sample was drawn from the most current Census Bureau population estimates.
    
    The U.S. sample design employed by the PIAAC is generally referred to as a four-stage stratified area probability sample. This method involves the selection of (1) primary sampling units (PSUs) consisting of counties or groups of contiguous counties, (2) secondary sampling units (referred to as segments) consisting of area blocks, (3) dwelling units (DUs), and (4) eligible persons (ultimate sampling unit) within DUs. Random selection methods are used, with calculable probabilities of selection at each stage of sampling. This sample design ensures the production of reliable statistics for a minimum of 5,000 completed cases.
- How does PIAAC differ from international student assessments?
  - As an international assessment of adult competencies, PIAAC differs from student assessments in several ways. PIAAC assesses a wide range of ages (persons between the ages of 16 and 65) whereas student assessments target a specific age (e.g., 15-year-olds in the case of PISA) or grade (e.g., grade 4 in PIRLS). PIAAC is a household assessment (i.e., an assessment administered in individuals’ homes), whereas the international student assessments (PIRLS, PISA, and TIMSS) are conducted in schools. The skills that are measured in each assessment also differ based on the goals of the assessment. Both TIMSS and PIRLS are curriculum-based and are designed to assess what students have been taught in school in specific subjects, such as science or mathematics, using multiple-choice and open-ended test questions. In contrast, PIAAC and PISA are “literacy” assessments, designed to measure performance in certain skill areas at a broader level than school curricula. So while TIMSS and PIRLS aim to assess particular academic knowledge that students are expected to be taught at particular grades, PISA and PIAAC encompass a broader set of skills that students and adults have acquired throughout life.
- How do PIAAC and PISA compare?
  - PISA and PIAAC both emphasize knowledge and skills in the context of everyday situations, asking students and adults to perform tasks that involve interpretation of real-world materials as much as possible. PISA is designed to show the knowledge and skills 15-year-olds have accumulated within and outside of school. It is intended to provide insight into what students who are about to complete compulsory education and continue with further education or potentially enter the workforce know and are able to do. PIAAC focuses on adults who are already eligible to be in the workforce, and aims to measure the set of literacy skills an individual needs to have in order to function successfully in society. Therefore, PIAAC is not measuring the academic skills or knowledge adults may have learned in school. The PIAAC assessment focuses on tasks adults may encounter in their lives at home, work, or in their community.
- How does PIAAC differ from earlier adult literacy assessments, like IALS and ALL?
  - PIAAC expands on knowledge and experiences gained from previous international adult assessments—the International Adult Literacy Survey (IALS) and the Adult Literacy and Lifeskills Survey (ALL). PIAAC improves and expands on these previous assessments’ cognitive frameworks, and has added an assessment of problem solving via computer, which was not a component of the previous studies. In addition, PIAAC is capitalizing on prior experiences with IALS and ALL in its approach to survey design and sampling, measurement, data collection procedures, data processing, and weighting and estimation. Finally, the most significant difference between PIAAC and previous large-scale literacy assessments, including IALS and ALL, is that PIAAC is administered on laptop computers, and is designed to be a computer-adaptive assessment so respondents will receive groups of items targeted to their performance levels.

About PIRLS (the Progress in International Reading Literacy Study)

What aspects of reading literacy are assessed in PIRLS?
- PIRLS focuses on three aspects of reading literacy:
  - purposes for reading;
  - processes of comprehension; and
  - reading behaviors and attitudes.
  PIRLS is administered near the end of the school year. For countries in the Northern Hemisphere this is typically between March and May of the study year. For countries in the Southern Hemisphere this is typically in October and November preceding the study year. Thus for PIRLS 2011, countries in the Northern Hemisphere will conduct the assessment between March and May, 2011. Countries in the Southern Hemisphere conducted the assessment in October and November, 2010. For PIRLS 2006, countries in the Northern Hemisphere conducted the assessment between March and May, 2006. However, in the United States, data collection began slightly earlier and ended in early June. Countries in the Southern Hemisphere conducted the assessment in October and November, 2005.

How many U.S. schools and students participated in previous PIRLS cycles?

Assessment year	Number of participating students	Number of participating schools	Overall weighted response rate (percent)
2001	3,763	174	83
2006	5,190	183	82

NOTE: The overall weighted response rate is the product of the school participation rate, after replacement, and the student participation rate, after replacement.

How does PIRLS select a representative sample of students?
- To provide valid estimates of student achievement and characteristics, PIRLS selects a sample of students that represents the full population of students in the fourth grade. This population was defined internationally as all students enrolled in the grade corresponding to the fourth year of formal schooling, beginning with the first year of schooling as defined by the International Standard Classification of Education (ISCED), Level 1.
  
  The sample design employed by the PIRLS 2006 assessment is generally referred to as a three-stage stratified cluster sample. Schools were selected at the first stage with probability proportional to size (PPS), size being the estimated number of students enrolled in the target grade. The second-stage sampling units were classrooms within sampled schools. Countries were required to randomly select a minimum of one eligible classroom per target grade per school from a list of eligible classrooms prepared for each target grade. The third-stage sampling units were students within sampled classrooms. Generally, all students in a sampled classroom were to be selected for the assessment. However, it was possible to sample a subgroup of students within a classroom, but only after consultation with Statistics Canada, the organization serving as the sampling referee.
  
  In 2006, PIRLS guidelines called for a minimum of 150 schools to be sampled per grade, with a minimum of 4,000 students assessed per grade. The school response rate target was 85 percent for all countries. A minimum participation rate of 50 percent of schools from the original sample of schools was required for a country’s data to be included in the international database. The response rate target for classrooms was 95 percent, and the target student response rate was set at 85 percent, from both original and substitute schools.
  
  U.S. sampling frame
  The PIRLS U.S. sample is drawn from the Common Core of Data (CCD) listing of public schools supplemented with the Private School Universe Survey (PSS) listing of private schools. The combination of these national listings has proven to be close to 100 percent complete.
  
  U.S. sampling design
  The U.S. 2006 PIRLS sample used a three-stage stratified cluster sampling design. While the U.S. sampling frame was not explicitly stratified, it was implicitly stratified (that is, sorted for sampling) by four categorical stratification variables: type of school (public or private), region of the country (Northeast, Central, West, Southeast); community type (eight levels); and minority status (above or below 15 percent of the student population).
  
  The first stage made use of a systematic PPS technique to select schools for the original sample. The second stage consisted of selecting fourth-grade classes within each participating school. All students in sampled classrooms were selected for the assessment. In this way, the overall sample design for the United States was intended to approximate a self-weighting sample of students as much as possible, with each fourth-grade student having an equal probability of selection.
  
  Substitute schools
  Countries are allowed to use substitute schools (selected during the sampling process) to increase the response rate once the 50 percent minimum participation rate of original school sampling is reached. In accordance with PIRLS guidelines, substitute schools are identified by assigning the two schools neighboring the sampled school in the frame as substitutes to be used in instances where an original sampled school refuses to participate. Substitute schools are required to be in the same implicit stratum (i.e., have similar demographic characteristics) as the sampled school.
Have there been changes in the countries participating in PIRLS?
- Yes. The composition of participating countries in PIRLS changed somewhat from 2001 to 2006, as some countries dropped out and others joined.
  
  Thirty-five countries from around the world participated in PIRLS 2001, and 38 countries plus 5 participating Canadian provinces and 2 separate jurisdictions of Belgium (Flemish and French) participated in PIRLS 2006. The United States was one of 29 jurisdictions to participate in both the 2001 and 2006 administrations of PIRLS.
If the makeup of the countries changes across the years, how can one compare countries to the PIRLS scale average?
- PIRLS scores are reported on a scale from 0–1,000 with the scale average fixed at 500 and a standard deviation of 100. The PIRLS scale average was set in 2001 and reflects the combined proficiency distribution of all students in all jurisdictions participating in 2001. To allow comparisons between 2001 and 2006, scores of students in jurisdictions that participated in both 2001 and 2006 (29 jurisdictions) were used to scale the 2006 results. The 2006 scores were linked to the 2001 scale using items common to both assessments. Once scores from the 2006 assessment were scaled to the 2001 scale, scores of students in jurisdictions that participated in 2006 but not in 2001, were placed on the PIRLS scale.
How does PIRLS compare to the NAEP fourth-grade reading assessment?
- Two studies have compared PIRLS and NAEP in terms of their measurement frameworks and the reading passages and questions included in the assessments. The first study, A Content Comparison of the NAEP and PIRLS Fourth-Grade Reading Assessments (852KB), compared PIRLS 2001 with NAEP and the second study, Comparing PIRLS and PISA with NAEP in Reading, Mathematics, and Science (231KB), compared PIRLS 2006 with PISA and NAEP. The studies found the following similarities and differences:
  
  Similarities
  - PIRLS and NAEP call for students to develop interpretations, make connections across text, and evaluate aspects of what they have read.
  - PIRLS and NAEP use literary passages drawn from children's storybooks and informational texts as the basis for the reading assessment.
  - PIRLS and NAEP use multiple-choice and constructed-response questions with similar distributions of these types of questions.
  Differences
  - PIRLS reading passages are, on average, shorter than fourth grade NAEP reading passages.
  - Results of readability analyses suggest that the PIRLS reading passages are easier than the NAEP passages.
  - PIRLS and NAEP differ with respect to the types of interpretation and evaluation that are asked of students. PIRLS calls for more text-based interpretation than NAEP. NAEP places more emphasis on having students take what they have read and make connections to other readings or knowledge, and to critically evaluate what they have read.
When are PIRLS data collected?
- PIRLS operates on a five-year cycle, with 2001 being the first year it was administered. For PIRLS 2006, countries in the Northern Hemisphere conducted the assessment between March and May, 2006. In the United States, data collection began slightly earlier and ended in early June. Countries in the Southern Hemisphere conducted the assessment in October and November, 2005. In both hemispheres the assessment is conducted near the end of the school year.
Where can I get a copy of the PIRLS U.S. Report?
- The U.S. PIRLS report, The Reading Literacy of U.S. Fourth-Grade Students in an International Context: Results From the 2001 and 2006 Progress in International Reading Literacy Study (PIRLS) can be downloaded from the NCES website, as can the PIRLS 2001 report: International Comparisons in Fourth-Grade Reading Literacy: Findings From the Progress in International Reading Literacy Study (PIRLS) of 2001.
When is PIRLS scheduled to be administered next?
- PIRLS is scheduled to be administered next in 2011, with results to be reported at the end of 2012.

About PISA (the Program for International Student Assessment)

What are the components of PISA?
- Assessment
  In 2009, PISA was a paper-and-pencil assessment that measured 15-year-old students' capabilities in reading, mathematics, and science literacy. Each student took a two-hour assessment. Assessment items include a combination of multiple-choice and open-ended questions that require students to come up with their own response.
  
  Questionnaires
  In 2009, students completed a 30-minute student questionnaire about themselves. In addition, the principal of each participating school completed a 30-minute questionnaire about the school.

What subject areas are assessed in PISA?

PISA measures student performance in reading literacy, mathematics literacy, and science literacy. Conducted every 3 years, each PISA data collection effort assesses one of these three subject areas in depth, although all three are assessed in each cycle. The subject covered in depth is considered the major subject area, and the other two subjects are considered minor subject areas for that assessment year. Assessing all three areas allows participating jurisdictions to have an ongoing source of achievement data in every subject area, while rotating one area as the main focus over the years.

PISA administration cycle

Assessment year	2000	2003	2006	2009	2012	2015
Subjects assessed	READING Mathematics Science	Reading MATHEMATICS Science Problem solving	Reading Mathematics SCIENCE	READING Mathematics Science	Reading MATHEMATICS Science Problem solving	Reading Mathematics SCIENCE
NOTE: Reading, mathematics, and science literacy are all assessed in each assessment cycle of the Program for International Assessment (PISA). A separate problem-solving assessment was administered in 2003 and is planned for 2012. The subject in all capital letters is the major subject area for that cycle.

In 2000, reading literacy was the major domain, covering two-thirds of the testing time. In addition to a combined reading literacy score, results were reported for three reading subscales: retrieving information, interpreting texts, and reflecting on texts. In 2003, mathematics literacy was the major domain and scores were reported on a combined mathematics literacy scale and four subscales: space and shape, change and relationships, quantity, and uncertainty. In 2006, science literacy was the major domain and scores were reported on a combined science literacy scale and three subscales: identifying scientific issues, explaining phenomena scientifically, and using scientific evidence. In 2009, reading literacy was again the main focus. More information on the PISA assessment frameworks can be found at the OECD website.

How many U.S. schools and students participated in previous PISA cycles?

Assessment year	Number of participating students	Number of participating schools	School response rate (percent)		Overall student response rate (percent)
Assessment year	Number of participating students	Number of participating schools	Original Schools	With substitute schools	Overall student response rate (percent)
2000	3,700	145	56	70	85
2003	5,456	262	65	68	83
2006	5,611	166	69	79	91
2009	5,233	165	68	78	87

How does PISA select a representative sample of students?
- To provide valid estimates of student achievement and characteristics, PISA selects a sample of students that represents the full population of 15-year-old students in each participating country and jurisdiction. This population is defined internationally as 15-year-olds attending both public and private schools in grades 7-12. PISA requires a minimum of 4,500 students from a minimum of 150 schools in each participating country and jurisdiction. Within schools, a sample of 35 students must be selected in an equal probability sample unless fewer than 35 students age 15 are available (in which case all students are selected). PISA requires that students in the sample be 15 years and 3 months to 16 years and 2 months at the beginning of the testing period. The school response rate target is 85 percent for all countries and jurisdictions. A minimum participation rate of 65 percent of schools from the original sample of schools is required for a country or jurisdiction’s data to be included in the international database. PISA also requires a minimum participation rate of 80 percent of sampled students from schools within each country and jurisdiction.
  
  U.S. sampling frame
  The PISA U.S. sample is drawn from the Common Core of Data (CCD) listing of public schools supplemented with the Private School Universe Survey (PSS) listing of private schools. The combination of these national listings has proven to be close to 100 percent complete.
  
  U.S. sampling design
  In 2009, the design for sample selection was a two-stage process, with the first stage a sample of schools and the second stage a sample of students within schools. The design is a stratified systematic sample, with sampling probabilities proportional to measures of school size. The frame is implicitly stratified (i.e., sorted for sampling) by stratification variables. In 2009, the PISA sample was stratified into eight explicit groups based on control of school (public or private) and region of the country (Northeast, Central, West, Southeast). Within each stratum, the frame was implicitly stratified by five categorical stratification variables: grade range of the school (five categories), type of location relative to populous areas (four categories), the first three digits of the Zip Code, percentage of minority students (above or below 15 percent), and estimated enrollment of 15-year-olds.
  
  Substitute schools
  Participating countries and jurisdictions are allowed to use substitute schools (selected during the sampling process) to increase the response rate once the 65 percent minimum participation rate is reached. In accordance with PISA guidelines, substitute schools are identified by assigning the two schools neighboring the sampled school in the frame as substitutes to be used in instances where an original sampled school refuses to participate. Substitute schools are required to be in the same implicit stratum (i.e., have similar demographic characteristics) as the sampled school.
Which countries participate in PISA?
- - PISA 2000: 43 countries and education systems participated (11 of these administered PISA 2000 in 2002).
  - PISA 2003: 41 countries and education systems participated.
  - PISA 2006: 57 countries and education systems participated.
  - PISA 2009: 65 countries and education systems participated.
  The list of countries and education systems that participated in each PISA cycle is available at: http://nces.ed.gov/surveys/pisa/countries.asp.
How does PISA differ from other international assessments?
- PISA differs from these studies in several ways:
  
  Content
  PISA is designed to measure "literacy" broadly, while other studies, such as TIMSS and NAEP, have a stronger link to curriculum frameworks and seek to measure students' mastery of specific knowledge, skills, and concepts. The content of PISA is drawn from broad content areas, such as space and shape for mathematics, in contrast to more specific curriculum-based content such as geometry or algebra.
  
  Tasks
  In addition to the differences in purpose and age coverage between PISA and other international comparative studies, PISA differs from other assessments in what students are asked to do. PISA focuses on assessing students' knowledge and skills in reading, mathematics, and science literacy in the context of everyday situations. That is, PISA emphasizes the application of knowledge to everyday situations by asking students to perform tasks that involve interpretation of real-world materials as much as possible. Analyses based on expert panels' reviews of mathematics and science items from PISA, TIMSS, and NAEP indicate that PISA items require multi-step reasoning more often than either TIMSS or NAEP. The study also shows that PISA mathematics and science literacy items often involve the interpretation of charts and graphs or other "real world" material. These tasks reflect the underlying assumption of PISA: as 15-year-olds begin to make the transition to adult life, they need to not only comprehend what they read or to retain particular mathematical formulas or scientific concepts, they need to know how to apply their knowledge and skills in the many different situations they will encounter in their lives.
  
  A recent study comparing the PISA and NAEP (grades 8 and 12) reading assessments found that PISA and NAEP view reading as a constructive process and both measure similar cognitive skills. There are differences between them, though, reflecting in part the different purposes of the assessments. First, NAEP has longer reading passages than PISA and asks more questions about each passage, which is possible because of the NAEP passages' longer length. With regard to cognitive skills, NAEP has more emphasis on critiquing and evaluating text, while PISA has more emphasis on locating information. NAEP also measures students’ understanding of vocabulary in context and PISA does not include any questions of this nature. Finally, NAEP has a greater emphasis on multiple-choice items compared to PISA and the nature of the open-ended items differs, where PISA open-ended items call for less elaboration and support from the text than do those in NAEP.
  
  To learn more about the differences in the respective approaches to the assessment of mathematics, science and reading among PISA, TIMSS, and NAEP, see the following papers:
  Age-based sample
  The goal of PISA is to represent outcomes of learning rather than outcomes of schooling. By placing the emphasis on age, PISA intends to show what 15-year-olds have learned inside and outside of school throughout their lives, not just in a particular grade. Focusing on age 15 provides an opportunity to measure broad learning outcomes while all students across the many participating nations are still required to be in school. Finally, because years of education vary among countries and jurisdictions, choosing an age-based sample makes comparisons across countries and jurisdictions somewhat easier.
  
  Information collected
  The kind of information PISA collects also reflects a policy purpose somewhat different from the other assessments. PISA collects only background information related to general school context and student demographics. This differs from other international studies such as TIMSS, which also collects background information related to how teachers in different countries approach the task of teaching and how the approved curriculum is implemented in the classroom. The TIMSS video studies further extend this work by capturing images of instruction across countries. The results of PISA will certainly inform education policy and spur further investigation into differences within and between countries and jurisdictions, but PISA is not intended to provide direct information about improving instructional practice in the classroom. The purpose of PISA is to generate useful indicators to benchmark performance and inform policy.
How does the performance of U.S. students in mathematics and science on PISA compare with U.S. student performance on TIMSS?
- The performance of U.S. students in grade 8 on TIMSS 2007, the most recent administration of TIMSS, showed U.S. average scores higher than the TIMSS scale average in both mathematics and science. In PISA 2009, the average scores of U.S. 15-year-old students were below the OECD average—the average score of students in the 30 Organization for Economic Cooperation and Development (OECD) countries—in mathematics and not measurably different from the OECD average in science. Such differences are difficult to compare because, while both TIMSS and PISA measure the mathematics and science achievement of students, they do so for different sets of students, in different ways, and in different sets of countries.
  
  TIMSS focuses on the mathematics and science achievement of students in the fourth and eighth grades. In contrast, PISA aims to assess the mathematics and science literacy of students near the end of their compulsory schooling, at age 15. These students range across several grades in most countries.
  
  TIMSS draws its content directly from the school curriculum and is designed to assess how well students have learned what they have ostensibly been taught. TIMSS emphasizes the links between achievement, mathematics and science curricula, and classroom practices. In contrast, PISA’s intent is to measure a "yield" of the skills and competencies accumulated and applied in real-world contexts by students at age 15. PISA emphasizes the mastery of processes, understanding of concepts, and application of knowledge. It draws not only from school curricula but also from learning that may occur outside of school. PISA does not explicitly examine mathematics and science curricula and classroom practices, though it does collect school information, including school background information and information on school practices and resources.
  
  Both assessments cover much of the world and include key economic partners and competitors, but there is only partial overlap between the sets of participating countries. For instance, only 27 of the 48 countries that participated in TIMSS 2007 at grade 8 participated in PISA 2009. Comparing the PISA countries with the TIMSS countries highlights the different sets of countries participating in each study. For example, European countries make up about two-thirds of all PISA countries but only one-third of TIMSS countries, and Middle-Eastern countries make up about 3 percent of all PISA countries but 25 percent of TIMSS countries. About 25 percent of TIMSS countries participate in PISA, and about one-half of PISA countries are in TIMSS as well.
When are PISA data collected in the United States?
- PISA operates on a 3-year cycle, with 2000 being the first assessment year. For PISA 2000, the U.S. data collection began in April and ended in May. For PISA 2003, the U.S. data collection was conducted in the spring (the same as in 2000) and again in the fall, beginning in September and ending in November. For PISA 2006 and 2009, the U.S. data collection was conducted only in the fall (September–November).
Where can I get a copy of the U.S. PISA reports?
- U.S. PISA report for PISA 2000, 2003, 2006, and 2009 can be downloaded using the links below.
When is PISA next scheduled to be administered?
- PISA is scheduled to be administered next in 2012, with results to be reported at the end of 2013.

About TIMSS (the Trends in International Mathematics and Science Study)

What areas of mathematics and science are assessed in TIMSS?
- At grade 4, TIMSS focuses on three domains of mathematics:
  - numbers (manipulating whole numbers and place values; performing addition, subtraction, multiplication, and division; and using fractions and decimals),
  - geometric shapes and measures, and
  - data display.
  At grade 8, TIMSS focuses on four domains of mathematics:
  - numbers,
  - algebra,
  - geometry, and
  - data and chance.
  At grade 4, TIMSS focuses on three domains of science:
  - life science,
  - physical science, and
  - Earth science.
  At grade 8, TIMSS focuses on four domains of science:
  - biology,
  - chemistry,
  - physics, and
  - Earth science.

How many U.S. schools and students participated in previous TIMSS cycles?

At grade 4

Assessment year	Number of participating schools	Number of participating students	Overall weighted response rate (percent)
1995	182	7,296	80
2003	248	9,829	78
2007	257	7,896	84

At grade 8

Assessment year	Number of participating schools	Number of participating students	Overall weighted response rate (percent)
1995	183	7,087	78
1999	221	9,072	85
2003	232	8,912	73
2007	239	7,377	77

NOTE: The overall weighted response rate is the product of the school participation rate, after replacement, and the student participation rate, after replacement. There was no grade 4 assessment in 1999.

How does TIMSS select a representative sample of students?
- To provide valid estimates of student achievement and characteristics, TIMSS selects a sample of students that represents the full population of students in the fourth and eighth grades. This population was defined internationally as (1) all students enrolled in the grade corresponding to the fourth year of formal schooling, beginning with the first year of schooling as defined by the International Standard Classification of Education (ISCED), Level 1; and (2) all students enrolled in the grade corresponding to the eighth year of schooling, again beginning with ISCED Level 1.
  
  The sample design employed by the TIMSS 2007 assessment is generally referred to as a three-stage stratified cluster sample. Schools were selected at the first stage with probability proportional to size (PPS), size being the estimated number of students enrolled in the target grade. The second-stage sampling units were classrooms within sampled schools. Countries were required to randomly select a minimum of one eligible classroom per target grade per school from a list of eligible classrooms prepared for each target grade. The third-stage sampling units were students within sampled classrooms. Generally, all students in a sampled classroom were to be selected for the assessment. However, it was possible to sample a subgroup of students within a classroom, but only after consultation with Statistics Canada, the organization serving as the sampling referee.
  
  TIMSS guidelines call for a minimum of 150 schools to be sampled per grade, with a minimum of 4,000 students assessed per grade. The school response rate target was 85 percent for all countries. A minimum participation rate of 50 percent of schools from the original sample of schools was required for a country’s data to be included in the international database. The response rate target for classrooms was 95 percent, and the target student response rate was set at 85 percent, from both original and substitute schools.
  
  U.S. sampling frame
  The TIMSS U.S. sample is drawn from the Common Core of Data (CCD) listing of public schools supplemented with the Private School Universe Survey (PSS) listing of private schools. The combination of these national listings has proven to be close to 100 percent complete.
  
  U.S. sampling design
  The U.S. TIMSS sample used a three-stage stratified cluster sampling design. While the U.S. sampling frame was not explicitly stratified, it was implicitly stratified (that is, sorted for sampling) by four categorical stratification variables: type of school (public or private), region of the country (Northeast, Central, West, Southeast); community type (eight levels); and minority status (above or below 15 percent of the student population). The first stage made use of a systematic PPS technique to select schools for the original sample. The second stage consisted of selecting intact mathematics classes within each participating school. All students in sampled classrooms were selected for assessment. In this way, the overall sample design for the United States was intended to approximate a self-weighting sample of students as much as possible, with each fourth- or eighth-grade student having an equal probability of selection.
  
  Substitute schools
  Countries are allowed to use substitute schools (selected during the sampling process) to increase the response rate once the 50 percent minimum participation rate of original school sampling is reached. In accordance with TIMSS guidelines, substitute schools are identified by assigning the two schools neighboring the sampled school in the frame as substitutes to be used in instances where an original sampled school refuses to participate. Substitute schools are required to be in the same implicit stratum (i.e., have similar demographic characteristics) as the sampled school.
Have there been changes in the countries participating in TIMSS?
- Yes. The composition of participating countries in TIMSS has changed somewhat from 1995 to 2007, as some countries have dropped out and others have joined.
  
  In 2007, more than 56 separate countries participated in TIMSS. TIMSS also allows subnational entities to participate as benchmarking partners in the assessment. The subnational entities that participated in TIMSS 2007 were as follows: the Basque country in Spain, four Canadian provinces (Alberta, British Columbia, Ontario, and Quebec), Dubai, and two states in the United States (Massachusetts and Minnesota). In the case of the Canadian provinces, the Basque country, and Dubai, the larger nation in which they are located chose not to participate. In the case of the states of Massachusetts and Minnesota, students in these states were eligible for participation in the U.S. national sample as well as in the separate samples that these states drew for the study.
If the makeup of the countries changes across the years, how can one compare countries to the TIMSS scale average?
- Achievement results from TIMSS are reported on a scale from 0 to 1,000, with a TIMSS scale average of 500 and standard deviation of 100. The scale is based on the 1995 results, and the results of all subsequent TIMSS administrations have been placed on this same scale. This allows countries to compare their performance over time as well as to compare with a set standard, the TIMSS scale average. However, the TIMSS scale average is a fixed standard.
How do the results of TIMSS compare with the results in PISA?
- The TIMSS TIMSS results at grade 8, the grade closest to the age of PISA students, showed U.S. average scores higher than the TIMSS scale average in both mathematics and science. In PISA 2006, the average scores of U.S. 15-year-old students were below the Organization for Economic Cooperation and Development (OECD) average—the average score of students in the 30 OECD countries. Such differences are difficult to compare because, while both TIMSS and PISA measure the mathematics and science achievement of students, they do so for different sets of students, in different ways, and in different sets of countries.
  
  TIMSS focuses on the mathematics and science achievement of students in the fourth and eighth grades, and selects whole classrooms of students for this purpose. In contrast, PISA aims to assess the mathematics and science literacy of students near the end of their compulsory schooling, at age 15. These students range across several grades in most countries.
  
  TIMSS draws its content directly from the school curriculum and is designed to assess how well students have learned what they have ostensibly been taught. TIMSS emphasizes the links between achievement, mathematics and science curricula, and classroom practices. In contrast, PISA’s intent is to measure a "yield" of the skills and competencies accumulated and applied in real-world contexts by students at age 15. PISA emphasizes the mastery of processes, understanding of concepts, and application of knowledge. It draws not only from school curricula but also from learning that may occur outside of school. PISA does not explicitly examine mathematics and science curricula and classroom practices, though it does collect school information, including school background information and information on school practices and resources.
  
  Both assessments cover much of the world and include key economic partners and competitors, but there is only partial overlap between the sets of participating countries. For instance, only 26 of the 48 countries that participated in TIMSS 2007 at grade 8 participated in PISA 2006. PISA focuses on the 30 OECD-member countries, treating the non-OECD jurisdictions separately. Comparing the PISA countries with the TIMSS countries highlights the different sets of countries participating in each study. For example, European countries make up about two-thirds of all PISA countries but only one-third of TIMSS countries, and Middle-Eastern countries make up about 3 percent of all PISA countries but 25 percent of TIMSS countries. About 25 percent of TIMSS countries participate in PISA, and about one-half of PISA countries are in TIMSS as well.
How does the mathematics and science achievement of U.S. students on TIMSS compare with achievement on NAEP?
- Mathematics
  The results from NAEP and TIMSS include information on trends over time in fourth- and eighth-grade mathematics for a similar time interval: in NAEP between 1996 and 2007 and in TIMSS between 1995 and 2007. For both grades, the trends shown by NAEP and TIMSS are largely consistent with one another. Both assessments showed statistically significant increases in the mathematics performance of fourth- and eighth-grade students—overall, among boys, and among girls.
  
  NAEP also reported increases over time for each of four racial/ethnic groups (White, Black, Hispanic, and Asian), for students at the top and bottom extremes of the distribution (at the 10th and 90th percentiles), and for students receiving free and reduced-price lunch, at both grades. The only exception was for Asian eighth-grade students, for which there was no measurable change; these results were calculated over a different time period (1992 to 2007) than the other NAEP trends. TIMSS only detected increases in mathematics performance for some of these groups (e.g., White and Black students in both grades, students in the 10th percentile in both grades) and no change for others (e.g., Hispanic fourth-grade students). This is likely to do with NAEP's larger sample sizes, which make it more sensitive to picking up small changes among nationally relevant subgroups than TIMSS, which is designed primarily to detect differences among countries.
  
  Science
  The results from NAEP and TIMSS also provide trend information for fourth- and eighth-grade science, although covering a slightly shorter time interval in NAEP than in TIMSS. NAEP provides trends for the period 1996 to 2005 and TIMSS for the period 1995 to 2007. Compared with mathematics, the trends shown by NAEP and TIMSS in science are less consistent with one another, which may not be surprising given the differing time periods and the differences in the assessments discussed in the previous sections. For example, in fourth grade, NAEP shows that there was an increase in students' science performance both overall and among boys between 1996 and 2005, whereas TIMSS did not detect any change in performance for either of those groups from 1995 to 2007.
  
  NAEP also reported increases in science performance for four racial/ethnic subgroups (White, Black, Hispanic, Asian/) whereas TIMSS only reported increases for Black and Asian students in the fourth grade. At the eighth-grade level, neither NAEP nor TIMSS showed any change in science performance among students overall. But in contrast to the fourth-grade results, TIMSS reported increases for Black, Hispanic, and Asian eighth-grade students, whereas NAEP only reported increases among Black students. This suggests that Hispanic and Asian eighth-grade students performed relatively better on the content unique to TIMSS than unique to NAEP.
Can you directly compare TIMSS scores at grade 4 to scores at grade 8?
- The scaling of TIMSS data is conducted separately for each grade and each content domain. While the scales were created to each have a mean of 500 and a standard deviation of 100, the subject matter and the level of difficulty of items necessarily differ between the assessments at both grades. Therefore, direct comparisons between scores across grades should not be made.
On TIMSS, why do U.S. boys outperform girls in mathematics at grade 4 but not at grade 8, and U.S. boys outperform girls in science at grade 8 but not at grade 4? Why aren't differences between the sexes more consistent?
- The seeming inconsistencies between the 2007 achievement scores of U.S. boys and girls in mathematics and science are not easily explainable. Research into differences in achievement by sex has been unable to offer any definitive explanation for these differences. For example, in examining sex differences primarily at the high school level Xie and Shauman (2003)¹ found that "differences in mathematics and science achievement cannot be explained by the individual and familial influences that we examine." Indeed, that sex differences vary in the participating TIMSS countries—some in favor of males and others in favor of females—would appear to support the idea that the factors related to sex differences in mathematics and science achievement are complicated.
  
  ¹Xie, Y. & Shauman, K. (2003). Women in Science: Career Processes and Outcomes. Boston, MA: Harvard University Press.
When are TIMSS data collected?
- TIMSS operates on a 4-year cycle, with 1995 being the first year it was administered. For TIMSS 2007, countries in the Northern Hemisphere conducted the assessment between April and June, 2007. Countries in the Southern Hemisphere conducted the assessment in October and November, 2006. In both hemispheres the assessment is conducted near the end of the school year.
Where can I get a copy of the TIMSS U.S. Report?
- The most recent U.S. TIMSS report, Highlights From TIMSS 2007: Mathematics and Science Achievement of U.S. Fourth- and Eighth-Grade Students in an International Context can be downloaded from the NCES website. Other U.S. reports for previous administrations of TIMSS can also be downloaded (see Reports under Publications & Products in the TIMSS section of the NCES website): http://nces.ed.gov/pubsearch/getpubcats.asp?sid=073#.
  
  The release date for the U.S. report on the findings from the TIMSS 2011 is December 11, 2012.
When is TIMSS scheduled to be administered next?
- TIMSS is scheduled to be administered next in 2015, with results to be reported at the end of 2016.

[Show All] International Exchange Programs and Foreign Study

Would you like to help us improve our products and website by taking a short survey?

YES, I would like to take the survey

or

No Thanks

The survey consists of a few short questions and takes less than one minute to complete.

National Center for Education Statistics - http://nces.ed.gov
U.S. Department of Education

Institute of Education Sciences
National Center for Education Statistics

U.S. Department of Education