Abstract |
RELATIONSHIP OF ACHIEVEMENT OF AIMSWEB MATHEMATICS CBM ASSESSMENT ON TAKS MATHEMATICS ASSESSMENT A Dissertation by BARBI AIRHART DONEHOO Submitted to the Office of Graduate Studies of Texas A&M University-Commerce in partial fulfillment of the requirements for the degree of DOCTOR OF EDUCATION August 2015 RELATIONSHIP OF ACHIEVEMENT OF AIMSWEB MATHEMATICS CBM ASSESSMENT ON TAKS MATHEMATICS ASSESSMENT A Dissertation by BARBI AIRHART DONEHOO Approved by: Advisor: Gilbert Naizer Committee: Mark Reid Delores Rice Head of Department: Martha Foote Dean of the College: Timothy Letzring Dean of Graduate Studies: Arlene Horne iii Copyright © 2015 Barbi Airhart Donehoo iv ABSTRACT RELATIONSHIP OF ACHIEVEMENT OF AIMSWEB MATHEMATICS CBM ASSESSMENT ON TAKS MATHEMATICS ASSESSMENT Barbi Airhart Donehoo, EdD Texas A&M University-Commerce, 2015 Advisor: Gilbert Naizer, PhD This study sought to determine whether differences existed between scores on AIMSweb Curriculum-based measure (CBM) in mathematics and the Texas Assessment of Knowledge and Skills (TAKS) mathematics test or whether differences were due to sex, ethnicity, and socioeconomic status (SES). Data were obtained for third-grade students who took the TAKS mathematics and AIMSweb CBM in mathematics, as well as for sex, ethnicity, and SES. The researcher analyzed the data to determine whether differences existed between AIMSweb CBM in mathematics and TAKS mathematics achievement, sex, ethnicity, and SES. Kruskal-Wallis tests of differences between several independent groups were used as the nonparametric alternative to the one-way analysis of variance (ANOVA) because the sample sizes of the groups were disparate. The data analysis revealed significant differences between TAKS mathematics scale scores and AIMSweb mathematics for the three administrations of the universal screener after removing the effects of ethnicity, sex, and SES. The findings also revealed no significant differences between each administration of AIMSweb and sex, ethnicity, or SES for each AIMSweb group after removing the effects of sex, ethnicity, and SES. v ACKNOWLEDGEMENTS “Strive to be significant, not successful” is a quote I live by each day because, if you are significant, you are successful. I want to thank my family, friends, and colleagues who are significant to me each and every day. I thank my family for being the motivation behind this degree, for being unbelievably supportive and loving, and for providing humor to me on a daily basis, which helped me get through the rough days. My friends provided laughter, support, encouragement, and unconditional love that never went unnoticed and I will forever be grateful and thankful to have them in my life. To the many colleagues I have been lucky to work with throughout this journey, I thank you for your advice, encouragement, and endless knowledge that I have gained every day from having worked with each of you. Special thanks go to Dr. Gilbert Naizer for his unlimited support, knowledge, patience, time, and advice throughout this process. I cannot thank him enough for his continuous reminders to stay focused and to continue writing on days when I felt I was simply staring at a blank page. Also, without the abundant statistical knowledge and support from Katy Denson, this paper would have never been completed. Thank you to Marc Reid and Deloris Rice for being a part of my dissertation committee. Lastly, I would like to thank my two children, Del and Roxi, for being my source of strength each and every day. They provided hugs when I needed them most, encouragement to never give up, forgiveness when writing had to be done instead of spending time with them, and for their unconditional love they give daily to me. I also want to thank my twin sister, Angie, for always knowing when to send a funny quote or text; calling me when she knew I needed to hear her voice; and supporting, praising, and encouraging me constantly. vi TABLE OF CONTENTS LIST OF TABLES .................................................................................................................... xiv CHAPTER 1 INTRODUCTION .......................................................................................................... 1 Statement of the Problem ........................................................................................ 3 Purpose of the Study ............................................................................................... 4 Research Questions ................................................................................................. 4 Research Hypotheses .............................................................................................. 5 Significance of the Study ........................................................................................ 5 Method of Procedure............................................................................................... 7 Research Design.......................................................................................... 8 Research Setting.......................................................................................... 8 Participant Description................................................................................ 9 Procedures ................................................................................................... 9 Treatment of the Data ................................................................................. 9 Definitions of Terms ............................................................................................. 12 Limitations ............................................................................................................ 13 Delimitations ......................................................................................................... 13 Assumptions .......................................................................................................... 13 Organization of the Study ..................................................................................... 14 2 REVIEW OF THE LITERATURE .............................................................................. 15 Classroom Assessments ............................................................................ 16 Mathematics Assessment .......................................................................... 16 vii CHAPTER Summative Assessments ........................................................................... 17 Formative Assessments ............................................................................. 18 Curriculum-Based Measurements ......................................................................... 19 AIMSweb .................................................................................................. 20 Texas Assessment of Knowledge and Skills (TAKS) .............................. 21 Differences in Assessments by Sex ...................................................................... 22 Ethnicity in Assessments ...................................................................................... 23 Socioeconomic Status in Assessments.................................................................. 23 Conclusion ............................................................................................................ 24 3 METHOD OF PROCEDURE....................................................................................... 25 Design of the Study ............................................................................................... 26 Research Questions ............................................................................................... 26 Research Hypotheses ............................................................................................ 27 Instrumentation ..................................................................................................... 28 AIMSweb Universal Screener .................................................................. 28 TAKS Mathematics Subtest ...................................................................... 29 Sample Selection ................................................................................................... 29 Data Gathering ...................................................................................................... 30 Treatment of Data ................................................................................................. 31 Summary ............................................................................................................... 33 4 PRESENTATION OF FINDINGS ............................................................................... 34 Research Question 1 ............................................................................................. 34 viii CHAPTER Fall Administration ................................................................................... 34 Winter Administration .............................................................................. 36 Spring Administration ............................................................................... 38 Null Hypothesis ........................................................................................ 40 Research Question 2 ............................................................................................. 41 Fall Administration ................................................................................... 41 Winter Administration .............................................................................. 43 Spring Administration ............................................................................... 44 Null Hypothesis ........................................................................................ 46 Research Question 3 ............................................................................................. 46 Fall Administration ................................................................................... 47 Winter Administration .............................................................................. 48 Spring Administration ............................................................................... 50 Null Hypothesis ........................................................................................ 52 Research Question 4 ............................................................................................. 52 Fall Administration ................................................................................... 53 Winter Administration .............................................................................. 55 Spring Administration ............................................................................... 56 Null Hypothesis ........................................................................................ 58 Summary ............................................................................................................... 58 ix CHAPTER 5 SUMMARY OF THE STUDY AND THE HYPOTHESES FINDINGS, CONCLUSIONS, IMPLICATIONS, RECOMMENDATIONS FOR FUTURE RESEARCH, AND SUMMARY ................................................................................. 60 Summary of the Study .......................................................................................... 60 Summary of Findings ............................................................................................ 61 Conclusions ........................................................................................................... 62 Research Question 1 ................................................................................. 62 Research Question 2 ................................................................................. 63 Research Question 3 ................................................................................. 64 Research Question 4 ................................................................................. 65 Implications........................................................................................................... 65 Recommendations for Future Research ................................................................ 67 Summary ............................................................................................................... 68 REFERENCES ........................................................................................................................ 69 VITA ....................................................................................................................... 76 x LIST OF TABLES TABLE 1. Means and Standard Deviations for TAKS Math Scale Scores by AIMSweb Group in Fall 2010 .................................................................................................... 35 2. All Possible Comparisons for TAKS Math Scale Scores by AIMSweb Group in Fall 2010 .................................................................................................................... 36 3. Means and Standard Deviations for TAKS Math Scale Scores by AIMSweb Group in Winter 2011 ................................................................................................ 37 4. Results from All Possible Comparisons for TAKS Math Scale Scores by AIMSweb Group in Winter 2011 .............................................................................. 38 5. Means and Standard Deviations for TAKS Math Scale Scores by AIMSweb Group in Spring 2011 ................................................................................................ 39 6. Results from All Possible Comparisons for TAKS Math Scale Scores by AIMSweb Group in Spring 2011 ............................................................................. 40 7. Means and Standard Deviations for TAKS Math Scale Scores by AIMSweb Group and Sex in Fall 2010 ..................................................................................... 42 8. Results from All Possible Comparisons for TAKS Math Scale Scores by AIMSweb Group and Sex in Fall 2010.................................................................... 42 9. Means and Standard Deviations for TAKS Math Scale Scores by AIMSweb Group and Sex in Winter 2011 ................................................................................ 43 10. Results from All Possible Comparisons for TAKS Math Scale Scores by AIMSweb Group and Sex in Winter 2011 ............................................................. 44 xi 11. Means and Standard Deviations for TAKS Math Scale Scores by AIMSweb Group and Sex in Spring 2011 ............................................................................... 45 12. Results from All Possible Comparisons for TAKS Math Scale Scores by AIMSweb Group and Sex in Winter 2011 ............................................................. 46 13. Means and Standard Deviations for TAKS Math Scale Scores by AIMSweb Group and SES in Fall 2010 ................................................................................... 47 14. Results from All Possible Comparisons for TAKS Math Scale Scores by AIMSweb Group and Socioeconomic Status in Fall 2010 ..................................... 48 15. Means and Standard Deviations for TAKS Math Scale Scores by AIMSweb Group and SES in Winter 2011 .............................................................................. 49 16. Results from All Possible Comparisons for TAKS Math Scale Scores by AIMSweb Group and SES in Winter 2011 ............................................................ 50 17. Means and Standard Deviations for TAKS Math Scale Scores by AIMSweb Group and SES in Spring 2011 .............................................................................. 51 18. Results from All Possible Comparisons for TAKS Math Scale Scores by AIMSweb Group and SES in Spring 2011 ............................................................. 52 19. Means and Standard Deviations for TAKS Math Scale Scores by AIMSweb Group and Ethnicity in Fall 2010 ........................................................................... 53 20. Results from All Possible Comparisons for TAKS Math Scale Scores by AIMSweb and Ethnicity in Fall 2010 ..................................................................... 54 21. Means and Standard Deviations for TAKS Math Scale Scores by AIMSweb Group and Ethnicity in Winter 2011 ...................................................................... 55 xii 22. Results from All Possible Comparisons for TAKS Math Scale Scores by AIMSweb and Ethnicity in Winter 2011 ................................................................ 56 23. Means and Standard Deviations for TAKS Math Scale Scores by AIMSweb Group and Ethnicity in Spring 2011 ....................................................................... 57 24. Results from All Possible Comparisons for TAKS Math Scale Scores by AIMSweb and Ethnicity in Spring 2011 ................................................................ 58 1 Chapter1 INTRODUCTION Mathematics achievement in the United States is assessed each year to measure student performance at different grade levels. According to the 2013 National Assessment of Educational Progress data (NAEP), the average mathematics scores for fourth and eighth graders were one point higher than in 2011, and 28 and 22 points higher, respectively, compared to the first assessment year in 1990 (National Center for Education Statistics [NCES], 2013). NAEP mathematics data is based on a score that ranges from zero to 500. The NAEP data from 2009 showed that gains in students’ average mathematics scores from earlier school years did not continue from 2007 to 2009 at Grade 4, and the overall average scores for fourth graders in 2009 was unchanged from those in 2007. For over 20 years, evaluations in mathematics achievement have revealed deficits in performance for students in the United States as compared to other nations (NCES, 1992). Although these scores improved, NAEP data from 2013 showed that international comparisons placed the United States in the middle in mathematics compared to other countries. Stiggins (2001) stated, “Good teaching encompasses good assessment” (p. 9). If instruction and assessment are not aligned, assessment results will not validate the instruction. Educators often discuss best practices for teaching, which includes the topic of assessments. Teachers must assess their students to monitor student progress or the lack thereof. With the recent attention on high-stakes testing, assessments have received a negative reputation because the public sometimes assumes that teachers are “teaching to the test.” Teachers may to adapt their instruction to test objectives and formats because of the attention given to test results (National Academy Press [NAP], 1993). According to Berlinger (2011), 2 high-stakes testing causes pressure on teachers to do whatever is deemed necessary to achieve the goal of mastery. Even with the negative attention that high-stakes assessments have received, data collected from these tests allow teachers to determine strengths and weaknesses of their students and the instruction delivered. Teachers can then make changes, adaptations, and interventions to better meet the needs of their students. Rather than focus on the negative effect of high-stakes testing on education, the focus should be on the benefit of having data from these assessments and being able to monitor, adapt, and adjust instruction for students to be successful. Educators use numerous methods to evaluate mathematics achievement. One of the most popular methods is a standardized multiple-choice form of testing (Reich, 2009). Data from these tests are analyzed and decisions are made as to how instruction needs to be modified. Sex, ethnicity, and socioeconomic status (SES) data are also important to consider because many states, as in Texas, use SES as a factor in determining accountability ratings for schools (Cunningham & Sanzo, 2002). According to Cunningham and Sanzo (2002), students with stronger economic support systems at home tend to achieve at or above grade level, whereas those from lower SES homes with less support tend to achieve at or below grade level. They also report that tests also tend to show racial achievement gaps throughout many states, with lower overall minority student performance compared to their non-minority peers. Curriculum-based measurement (CBM) was designed to sample items from the curriculum content domain to produce outcome measures that represent the curriculum and that are able to demonstrate academic growth (Deno, 1985; Fuchs & Deno, 1991; Fuchs, Fuchs, & Courey, 2005; Leh, Jitendra, Caskie, & Griffin, 2007; Shinn, 1998). Data from CBM provides teachers with information about student progress and can indicate whether changes or 3 modifications to instruction or curriculum are necessary (Fuchs et al., 2005; Fuchs, Fuchs, Hamlett, & Stecker, 1991; Leh et al., 2007; Whinnery & Stecker, 1992). AIMSweb is a curriculum-based assessment that measures math skills and can be generalized for any curriculum. AIMSweb has published studies relating reading results to end-of-year reading assessments; however, little has been published in regards to mathematics. This study aimed to determine whether differences exist between the AIMSweb mathematics CBM and the Texas Assessment of Knowledge and Skills (TAKS) mathematics test. This study also aimed to determine whether differences in scores existed on these assessments due to sex, ethnicity, and SES among third-grade students. The results of this study should help educators tailor instruction and needed interventions for students to master concepts in mathematics. The findings of the study should also help answer the question of whether performance on the AIMSweb mathematics CBM can help determine success or lack of success on the TAKS mathematics test, as well as whether differences exist due to sex, ethnicity, and SES. Statement of the Problem Because high-stakes testing is a part of public education, it is impossible to ignore that students need practice for end-of-year tests during the school year. Teachers often incorporate time to take practice tests, score tests, review results with students, and reteach concepts that students do not successfully meet standards. Therefore, practice tests need to be accurate assessments of a student’s ability, easily scored, and not take too much instructional time away from the teacher. AIMSweb is a test designed to be quick and easily administered and to provide information as to which students might need more intervention to be successful in skills designed for their specific grade levels. However, it is unknown whether differences exist between the results of AIMSweb CBM and TAKS mathematics tests or whether differences exist due to sex, 4 ethnicity, and SES. Determining whether differences exist due to sex, ethnicity, and SES will help educators provide instruction that will meet the needs of learners in each of these student populations. Purpose of the Study The purpose of this study was to determine whether differences existed in TAKS results compared to AIMSweb screener results for each administration of the test (fall, winter, and spring), while also considering ethnicity, SES, and sex for third-grade students. School districts use a variety of assessments to monitor student progress. The assessments and tools that districts use to monitor progress may be validated to provide accurate data on how students will perform on high-stakes tests. Schools and districts use assessments from large publishers without truly knowing the data or research that supports their use. This study examined differences between TAKS mathematics achievement and AIMSweb CBM mathematics proficiency levels, as well as by sex, ethnicity, and SES. Research Questions The following research questions guided this study to determine whether differences exist in TAKS mathematics achievement and AIMSweb CBM mathematics proficiency levels, as well as by sex, ethnicity, and SES. 1. Do differences exist in TAKS mathematics scale scores based on AIMSweb proficiency levels for each administration of AIMSweb mathematics universal screeners? 2. Do differences exist in TAKS mathematics scale scores based on AIMSweb proficiency levels and sex for each administration of AIMSweb mathematics universal screeners? 5 3. Do differences exist in TAKS mathematics scale scores based on AIMSweb proficiency levels and SES for each administration of AIMSweb mathematics universal screeners? 4. Do differences exist in TAKS mathematics scale scores based on AIMSweb proficiency levels and ethnicity for each administration of AIMSweb mathematics universal screeners? Research Hypotheses The researcher tested the following hypotheses: H1: No statistically significant differences exist in TAKS mathematics scale scores based on AIMSweb proficiency levels for each administration of AIMSweb mathematics universal screeners. H2: No statistically significant differences exist in TAKS mathematics scale scores based on AIMSweb proficiency levels and sex for each administration of AIMSweb mathematics universal screeners. H3: No statistically significant differences exist in TAKS mathematics scale scores based on AIMSweb proficiency levels and SES for each administration of AIMSweb mathematics universal screeners. H4: No statistically significant differences exist in TAKS mathematics scale scores based on AIMSweb proficiency levels and ethnicity for each administration of AIMSweb mathematics universal screeners. Significance of the Study With the passing of No Child Left Behind (NCLB, 2002), all states are required to test students to measure academic success rates. States are not required to use the same testing 6 instrument; however, each state should have an assessment tool that measures the curriculum adopted for the different grade levels and subjects being tested. Berlinger (2011) indicated that more instructional time is spent teaching reading and mathematics and less time is focused on other subject areas such as social studies or science, which is referred to as narrowing the curriculum to spend time teaching topics that are assessed. Although some consider that narrowing of the curriculum has a negative influence on education, Yeh (2005) found otherwise. Teachers in Yeh’s study felt if narrowing the curriculum benefited students and did not force them to eliminate valuable chunks of the curriculum, then the assessment was well designed and tested the curriculum effectively. If teachers feel that the assessment was aligned to the curriculum, practice tests could be used throughout the year to ensure the curriculum was taught and students were successful. Yeh’s study is important because school districts nationwide use AIMSweb testing to determine a student’s ability in mathematics. Further, all third-grade public school children in Texas are required to take a state-based assessment in mathematics. Therefore, it would be beneficial to know whether performance on the AIMSweb testing is a valid predictor of performance on the state assessment. AIMSweb has published studies (Pearson Education, 2010) indicating correlations between AIMSweb CBM scores and achievement on state-based assessments in reading; however, little is known about the mathematics portion of the assessment. AIMSweb is used in many school districts throughout the United States because of its ability to assess students in mathematics, not just in reading, which necessitates additional research to determine whether this assessment is a valid predictor of student achievement on state-based mathematics assessments. School districts have the ability to choose the assessment tool used to determine student success, 7 which makes a study such as this beneficial to insure that money is being spent wisely to assess students. Not only will the results from this study benefit Gold ISD, they will benefit all districts that use the AIMSweb testing tool in the state of Texas. Method of Procedure The following section details the research setting, participants, data collected, and the analysis performed. The researcher relied on data collected from the 2011 TAKS scores in mathematics for third-grade students from a small suburban school district, Gold Independent School District (ISD; pseudonym) in central Texas. Data were collected from the fall, winter, and spring administrations of the AIMSweb CBM in mathematics for the 2010–2011 school year. Collected data also include sex, ethnicity, and SES for those same third-grade students. AIMSweb is an assessment, data organization, and reporting system that provides the framework and data necessary for response to intervention (RtI) and multi-tiered instruction methods (Pearson Education, 2012). The CBM is a standardized test in mathematics based on general outcome measurement principles that educators can use to evaluate student progress efficiently and accurately. Students are given the universal screener three times each school year. Data from the screeners are used to plan instruction and interventions. AIMSweb can be used to assess mathematics skills and can be generalized to any curriculum. The mathematics CBM is a timed, 8-minute, open-ended, paper-based test that can be administered in groups or individually (Pearson Education, 2010). The TAKS was designed to measure the extent to which a student has learned and is able to apply the defined knowledge and skills at each tested grade level (TEA, 2011b). Evidence to support the validity of the TAKS is based on and organized into the following five categories: test content, response processes, internal structure, relations to other variables, and consequences 8 of testing. Reliability of TAKS was measured using Kuder-Richardson 20 (KR20), which is applied for tests with only multiple-choice items (TAKS Technical Digest, 2011b). The TAKS test was deemed reliable in the state of Texas because of the multi-step procedures followed during the design of the test. Research Design The quasi-experimental study used the Kruskal-Wallis test of differences to determine whether differences exist between the AIMSweb CBM in mathematics for each of the three administrations and the TAKS mathematics and whether differences exist in test scores by sex, ethnicity, and SES. The dependent variable was the TAKS scale score. The independent variables were AIMSweb mathematics CBM proficiency levels, sex, ethnicity, and SES. This research method was appropriate for the study to separate the effects of the independent variables on the dependent variable and to examine the unique contribution of each variable (Allison, 1999). Research Setting This study was a convenience sample and took place at a small suburban school district that consisted of nine elementary campuses in the southern central United States. The district comprised of approximately 8,000 students, of which 11% were African American; 22% Hispanic; 63% White; and 4% American Indian, Asian, Pacific Islander, or multiracial. Additionally, 27% of students in the district were economically disadvantaged, 5% limited English proficient (LEP), and 25% at risk for failure. The AIMSweb assessment used in this study was in its first year of implementation in the selected school district with the universal screener in mathematics given three times during the year (fall, winter, and spring). Data from 9 all three administrations of the test for third-grade students who met the study criteria were included in the analyses. Participant Description Data were collected from the AIMSweb mathematics CBM and TAKS mathematics administered to third-grade students enrolled in Gold ISD. Data collected included sex, ethnicity, SES, and achievement scores for each test administration. Students who took an alternate form of the TAKS (TAKS-Alternative, TAKS-Modified, or TAKS-Accommodated) were excluded from the study. Between 463 to 472 third-grade students who were enrolled on the dates the test was administered were included in the study. Procedures The school district and university Institutional Review Board (IRB) granted permission to conduct the study prior to data collection. Data were provided by the school district for the 2011 TAKS mathematics; the three administrations (fall, winter, and spring) of the AIMSweb CBM for the 2010–2011 school year; and student sex, ethnicity, and SES. All data were anonymous and unidentifiable to the researcher. Data were provided in electronic form using a spreadsheet that contained all information necessary to address each variable. The researcher analyzed the data using the Statistical Package for the Social Sciences (SPSS) to determine whether differences existed for each research question. In addition to reporting the results in this dissertation, the researcher will provide the school district with a report of the result so officials are informed when making changes to curriculum or assessments administered to students. Treatment of the Data Before the data analyses were conducted, the researcher computed residual scores to remove the effects of sex, ethnicity, and SES as needed for each research question (Cohen, 10 Cohen, West, & Aiken, 2003). This was done using multiple regression analyses with the TAKS mathematics scale score as the dependent variable and the relevant combination of sex, ethnicity, and SES as the independent variables. In essence, this procedure statistically removed the effects of variables that could confound the results (Cohen et al., 2003). The residual scores were saved and converted back to a TAKS-like score with similar means and standard deviations. The new residual scores were used as the dependent variable in the analyses. The researcher used Kruskal-Wallis tests of differences between several independent groups as the nonparametric alternative to the one-way analysis of variance (ANOVA) because the sample sizes for the groups were disparate and violated the assumptions of relatively equal sample sizes and homogeneity of variance (Field, 2013). All possible comparison post hoc tests were computed to assess differences using p-values adjusted for the number of comparisons to address each research question. Effect sizes were calculated for each comparison by dividing the standardized test statistic (z-score) by the square root of the total number of students in the comparison (Field, 2013). The researcher computed three separate analyses, one for each administration of the AIMSweb mathematics universal screener in the fall, winter, and spring. The dependent variable for each analysis was the residualized version of the TAKS mathematics scale score. Although all possible comparisons were computed, only those comparisons of the independent variables for each hypotheses at the same AIMSweb proficiency levels are reported in the tables and discussion because these were the primary groups of interest. For Research Question 1, the TAKS mathematics score was adjusted for sex, ethnicity, and SES. This process statistically removed the effects of the three demographic variables from the mathematics scale score (Cohen et al., 2003). The independent variable was the proficiency 11 level established by the AIMSweb screener, which included well-above average, above average, average, below average, and well-below average. For Research Question 2, the TAKS mathematics score was adjusted for ethnicity and SES, statistically removing the effects of these two demographic variables from the mathematics scale score, but leaving the effects of sex (Cohen et al., 2003). A new grouping variable was computed to account for sex at each proficiency level established by the AIMSweb screener. The following 10 groups were created: well-above average males, well-above average females, above average males, above average females, average males, average females, below average males, below average females, well-below average males, and well-below average females. For Research Question 3, separate analyses were computed for each administration of the AIMSweb mathematics universal screener in the fall, winter, and spring. The dependent variable for each analysis was the residualized version of the TAKS mathematics scale score. The TAKS mathematics score was adjusted for ethnicity and sex, statistically removing the effects of these two demographic variables from the mathematics scale score, but leaving the effects of SES (Cohen et al., 2003). A new grouping variable was computed to account for SES at each proficiency level established by the AIMSweb screener. Ten groups were created, including well-above average disadvantaged, well-above average not disadvantaged, above average disadvantaged, above average not disadvantaged, average disadvantaged, average not disadvantaged, below average disadvantaged, below average not disadvantaged, well-below average disadvantaged, and well-below average not disadvantaged. For Research Question 4, separate analyses were computed for each administration of the AIMSweb mathematics universal screener in the fall, winter, and spring. The dependent variable for each analysis was the residualized version of the TAKS mathematics scale score. The TAKS 12 mathematics score was adjusted for sex and SES, statistically removing the effects of these two demographic variables from the mathematics scale score, but leaving the effects of ethnicity (Cohen et al., 2003). Because a larger percentage of students in this district were White (63%), ethnicity was disaggregated into two groups, White and other. Other included all students with any reported ethnicity other than White. A new grouping variable was computed to account for ethnicity at each proficiency level established by the AIMSweb screener. Ten groups were created, including well-above average other, well-above average White, above average other, above average White, average other, average White, below average other, below average White, well-below average other, and well-below average White. Definitions of Terms The following terms are defined as they are used in this study: AIMSweb CBM mathematics. “AIMSweb is a universal screening that uses brief, valid, and reliable measures of mathematics performance that can be generalized to any curriculum” (Pearson Education, n.d.). Ethnicity. The demographic characteristic used in this study; specifically, White and other, which consisted of African American, Hispanic, and American Indian (TEA, 2011a). Mathematics achievement. Passing standard set by the TEA on the state test, TAKS, and by the AIMSweb testing administrators for AIMSweb (Pearson Education, 2012; TEA, 2011b) Sex. “The state of being male or female” (Merriam-Webster Online, n.d) Socioeconomic status (SES). The demographic characteristic used in this study: low SES (disadvantaged) or not low SES (not disadvantaged; TEA, 2011a). Third grade. Classroom comprised of typically 8 and 9 year olds (Tex. Educ. Code Ann. §25). 13 Texas Assessment of Knowledge and Skills (TAKS). “Texas state standardized test used to assess student attainment of mathematics skills required under Texas education standards” (TEA, 2011b). Limitations This study had the following limitations: 1. Data from only one school district and one grade level were used. 2. After the fall administration of AIMSweb, students generally understood the format of the universal screener given in the winter and the spring. 3. The district selected had limited diversity in ethnic groups and SES. Delimitations The researcher applied the following delimitations to the study: 1. The study focused on third-grade students only. 2. Data were collected from one school district in Texas. 3. The study included only students who took the standard form of TAKS; those who took the TAKS Accommodated, TAKS Alternative, or TAKS Modified were excluded. 4. Data were collected for one academic school year. Assumptions The researcher made the following assumptions: 1. The TAKS is a valid and reliable measure of student achievement in mathematics. 2. The AIMSweb CBM is a valid and reliable measure of student achievement in mathematics. 3. Students put forth their best effort on both the AIMSweb and TAKS. 14 4. AIMSweb and TAKS was administered correctly according to the test design at each campus and within each classroom. 5. The sample selected was a close representation of the population of the district. Organization of the Study This dissertation is comprised of five chapters. Chapter 1 includes an introduction, statement of the problem, purpose of the study, research questions, research hypotheses, significance of the study, method of procedure, definition of terms, limitations, delimitations, and assumptions. Chapter 2 provides a review of the literature and includes information on mathematics and assessments. Chapter 3 consists of the research methodology. The data analysis is reported in Chapter 4. The findings and applications for educators are discussed in Chapter 5. 15 Chapter 2 REVIEW OF THE LITERATURE According to the National Academy Press (NAP, 1993), “To be useful to society, assessment must advance education, not merely record its status” (p. 1). Assessments are a way to help teachers and parents measure what students know and to determine what they need to learn. Assessments play an important role in expressing clearly and directly how students are learning and how well school systems are responding to the national call for higher education standards (NAP, 1993). Assessments in education can help teachers make curricular decisions about the content that has been taught and measure students’ knowledge of that content. Assessments can also help teachers understand what and how students’ learn, which assists them in planning activities that better meet students’ needs. Assessment results are also used to provide interventions when necessary and modify or adapt instruction to meet students’ needs. Educators have used assessments in classrooms to monitor student progress and mastery of skills. Effective teachers use assessments as a means of diagnosing their students’ difficulties as well as monitoring their own instructional practices (Kulm, 1994). The time spent assessing students is increasing in classrooms throughout the country. Testing is viewed as an instrument to hold schools accountable for the instruction that occurs within the school building. Schools in Texas have earned an accountability rating since 1994 based on performance from the end of year state assessment. With the emphasis placed on high-stakes testing, these assessments have received a negative reputation because of the assumption that teachers are “teaching to the test.” Commonly used standardized testing has the ability to 16 test many students at a low cost. Therefore, teachers do seem to adapt their instruction to the test objectives and format because of the attention given to test results (NAP, 1993). Gardner (1992) defined assessments as tools to obtain information about individuals’ skills and potentials, with dual goals of providing useful feedback to individuals and helpful data to the surrounding community. Different types of assessments are designed to measure different aspects of student learning and classroom success. For example, classroom, formative, and summative assessments are designed to have different functions, but their definitions often become intertwined in the general and educational communities. Classroom Assessments Classroom assessments are designed to monitor classroom progress on daily skills. The aim of these assessments is toward course improvements rather than student progress as a way to better understand student learning and improve teaching. Classroom assessments are used to increase dialogue between students and teachers to improve the learning process for all students. Additionally, classroom assessments help educators better understand what their students are learning. These assessments give students an equal opportunity to perform well because they are based solely on the instruction taught in class. The types of assessments that educators use within the classroom are important to assess learning (Education Testing Service, 2003). Mathematics Assessment Not all assessments created to measure mathematics ability inform educators on necessary curriculum changes. Most tests created to measure success in mathematics actually furnish information on procedures that students are able to perform correctly (Kulm, 1994). Many mathematics assessments are designed by following checklists of what students are required to learn and these tests use multiple-choice questions to assess a broad variety of skills. 17 Many current mathematics assessments distort mathematical reality by presenting questions as a set of isolated, disconnected fragments, facts, and procedures (NAP, 1993). For assessments to support learning, related tasks must provide genuine opportunities for all students to learn significant mathematics. Results from mathematical assessments should produce information that can be used to improve student access to mathematical knowledge and to learn some mathematics from the assessment tasks presented. Assessments make the goals for mathematics learning real to students, parents, policymakers, and the public (NAP, 1993). Teachers can also measure the success of their instruction effectively through formative mathematics instruction and make adjustments when necessary (Deno, 1986). Mathematics assessments can be divided into two categories: internal and external. Internal assessments provide information about student performance, which allows teachers to make adjustments in their instructional techniques to improve student performance. External assessments provide information to state and local agencies, funding bodies, policymakers, and the public about mathematics programs (NAP, 1993). Summative Assessments Summative assessments are used to gauge student achievement over an extended period. These types of tests are designed to indicate how and what students know at any one point in time (Gorlewski, 2008). With summative assessments, data are collected after completing instruction (Deno & Espin, 1991). Many traditional tests, such as end-of-chapter tests, unit tests, textbook tests, or standardized tests are considered summative. These tests are typically designed with a narrow focus on a few specific skills, concepts, and procedures that have been taught. Additionally, summative assessments seldom give a broad picture of how well students of varying backgrounds and capabilities received the instruction. Further, published tests rarely 18 have the capability to supply teachers with the information needed to tailor instruction to the needs of their students (Kulm, 1994). With the passing of No Child Left Behind (NCLB, 2002), states are required to hold public schools accountable by assessing students in reading, mathematics, and science in Grades 3 through 8, and in one high school grade (Grodsky, Warren, & Felts, 2008). In addition to monitoring progress in specific grades, districts are required to monitor student progress based on poverty status, race, and ethnicity. Standardized tests are commonly used to hold administrators and educators accountable for student academic achievement. The aim of standardized tests is to include identical items to assess student performance across similar environments. Therefore, these assessments monitor student performance and are not dependent on the test administration or administrator (Grodsky et al., 2008). Formative Assessments Formative assessments are used to collect data during instruction and modify the instruction when necessary (Deno & Espin, 1991). These tests are designed to improve teaching and learning (Gorlewski, 2008). Using formative assessments, teachers can monitor and adjust their instruction in a more timely manner to better meet students’ needs. Formative assessments require repeated, frequent measurements that are easy and quick to administer and interpret and that provide useful information about student performance (Deno, 1985; Shinn, 1989). Formative and summative assessments can have the same design; however, the way results are used determines whether an assessment is formative or summative. Typically, summative assessments are associated with students earning a grade for the test, whereas formative assessments are not necessarily graded (Gorlewski, 2008). If a grade is given on a 19 formative test, the primary intent is not for students to earn a grade and move on to the next topic, but rather to improve teaching and learning in the classroom. Curriculum-Based Measurements Curriculum-based measurements (CBMs) were designed to sample items from the curriculum content domain to produce outcome measures that represent the curriculum and that are sensitive to growth (Deno, 1985; Fuchs & Deno, 1991; Fuchs et al., 2005; Leh et al., 2007; Shinn, 1998). Data from CBMs provide teachers with information about student progress and indicate whether changes or modifications to instruction or curriculum are necessary (Fuchs et al., 2005; Fuchs et al.,1991; Leh et al., 2007; Whinnery & Stecker, 1992). Curriculum-based measurements are a type of formative assessment tool that meets the scientific standards for progress monitoring (National Center on Student Progress Monitoring [NCSPM], 2005). Mathematics-CBMs (M-CBMs) were designed to address the need for ongoing progress monitoring in mathematics computation and to serve as a measure of general mathematics achievement (Thurber, Shinn, & Smolkowski, 2002). With these types of assessments, students are expected to write answers to standardized computational tasks drawn from the general curriculum. These writing tasks vary from 2 to 5 minutes (Deno, 1985; Shinn, 1989, 1998). Research has found that M-CMBs have high interrater agreement (.97), high 1-week test-retest reliability (.87), moderate alternate form reliability (.66), and high alternate form reliability (.91; Thurber et al. 2002; Tindal, Marston, & Deno, 1983). Curriculum-based measures typically test one of two areas, computations/operations and application/problem solving. Computation involves working mathematics problems, whereas applications assess the use and understanding of mathematics concepts to solve problems (Howell, Fox, & Morehead, 1993; Salvia & Ysseldyke, 1991; Silbert, Carnine, & Stein, 1990). 20 The M-CBM is a combination of computation and application knowledge (Howell et al., 1993). Additionally, M-CBMs for elementary school students typically require completion of computational or conceptual problems based on current grade level (Clarke, Baker, Smolkowski, & Chard, 2008). Limited studies have examined the use and effectiveness of M-CBM. Thurber et al. (2002) administered mathematics probes that measured computation, applications, or general mathematics competence and reading tests to 207 fourth-grade students. The results reveal that mathematics computation and application were distinct and related to each other, and reading skills played an important role in general mathematics assessments. AIMSweb AIMSweb is an assessment and data management system that serves as the framework for any Response to Intervention (RtI) program and tiered instruction. This web-based tool provides multiple CBM assessments for universal screening and progress monitoring and web-based management, charting, and reporting. AIMSweb was designed for use in Grades K-8 to evaluate early literacy, reading, mathematics, and behavior skills (Pearson Education, 2012). The AIMSweb CBM-based probes are designed to be quick and easy to administer and score. Most probes are designed to be administered in 5-8 minutes, and some can be group-administered. The AIMSweb Mathematics Concepts and Applications (M-CAP) AIMSweb assessment was designed to assess a variety of mathematical domains. Each test item is given a value of 1, 2, or 3, depending on item difficulty. Students are scored based on the number of correctly answered test items. Shapiro, Dennis, and Fu (2015) used the AIMSweb to investigate whether a single point screening could predict student outcomes on annual state assessments in mathematics. The 21 researchers assessed 250 third, fourth, and fifth graders once a month over a 7-month period using a computer adaptive test and CBMs in mathematics (including AIMSweb). The results show that AIMSweb M-CAP was not a significant predictor of achievement on the third- and fourth-grade state assessments; however, it was a significant predictor on the fifth-grade assessment. Texas Assessment of Knowledge and Skills (TAKS) The Texas Assessment Knowledge and Skills (TAKS) is a standardized test used in Grades 3 to 11 to assess student acquisition of reading, writing, mathematics, science, and social studies skills required under the standards set forth by the Texas Education Agency (TEA). The assessment was developed and is scored by Pearson Educational Measurement with assistance from the TEA. The test for each grade level was designed based on a specific blueprint for test design. Test blueprints are based on educator review committees and TEA curriculum and assessment staff recommendations for each objective. According to the TEA, these assessments perform three functions: (a) reflect the level of difficulty and range of content of the skills defined in the b), “Texas Essential Knowledge and Skills (TEKS); (b) incorporate items determined to be free of possible sex, ethnic, and cultural bias and presumed acceptable by the educator review committees; and (c) resonate problem-solving and complex thinking skills” (p. 20). After a data review, the TEA develops tests from a bank of acceptable items. All test items are field-tested, which is performed by either incorporating items into operational tests or by giving separate field-test forms to specific groups of students. Field-test items are randomly distributed to students across the state of Texas to ensure that a large sample of responses is collected. Potential ethnic bias is examined by ensuring the sample selection is proportionate to 22 the number of African American and Hispanic student populations in Texas (TEA, 2011b). Once an item is field-tested, the TEA and Pearson Education curriculum and assessment specialists and psychometricians review each test item on objective and student expectation match, appropriateness, level of difficulty, and bias (i.e., economic, regional, cultural, sex, and ethnic) to determine whether the item is accepted or rejected. Rejected items are no longer considered for future use on any test, while accepted items are placed in a test bank for future use (TEA, 2011b). Differences in Assessments by Sex Assessments are completed in a variety of subject across all states, and each test is analyzed in a variety of ways, one of which is an examination of sex differences. Studies have found that sex differences among assessments are minimal; however, these assessments typically reveal that girls slightly outperform boys in reading, and vice versa for mathematics (Grodsky et al., 2008). However, the stereotype that girls do not perform well or are not good in mathematics seems questionable based on several studies that have found otherwise. For example, Hyde, Lindberg, Linn, Ellis, and Williams (2008) examined standardized tests for Grades 2 to 11 from all available states. Test items were coded from 1 to 4 based on question difficulty. The researchers found no sex difference in mathematics performance. It seems that the data suggesting that girls underperform in mathematics compared to boys may be changing with time. Scafidi and Bui (2010) replicated Hyde et al.’s (2008) study using data from the National Education Longitudinal study for students in Grades 8, 10, and 12. The sample population included 9,813 students, of which 51% were female. The results reveal that sex did not have an overall effect on performance. Despite the stereotype that girls 23 underperform compared to boys on mathematics exams, Scafidi and Bui found that girls performed similarly to boys. Ethnicity in Assessments Ethnic or racial differences in testing have been evident in formal testing for many years. African American and Hispanic students generally score lower than do White students, while Asian students typically outscore all other ethnicities on standardized mathematics and reading assessments. Differences among ethnicities declined and stabilized in both reading and mathematics at the secondary level through the 1980s (Grodsky et al., 2008). Carrier, Thomson, Tugurian, and Stevenson (2014) found that White students outperformed Asian, African American, and Hispanic students. The researchers noted that language or cultural biases could play a role in the different ethnic groups outperforming other ethnicities. Madaus and Clarke (2001) noted that high-stakes testing was an inequitable way to assess students of different races or cultures. Additionally, Vaughn (2010) examined the results of three African American students’ TAKS mathematics scores to determine characteristics that would help educators increase African American student performance on mathematics state assessments. The findings indicated that student motivation, lack of parental support, distractions in the classroom, and low expectations from teachers were reasons for low performance among these students. Ethnicity in high stakes testing is important because it is one of the factors used when determining a state or federal accountability rating (NCLB, 2002). Socioeconomic Status in Assessments Socioeconomic status (SES) is reported to the state as part of a student’s demographic information. This information indicates whether the student is considered economically disadvantaged based on household income, is confidential, and not typically released unless 24 needed as part of an educational interest for the child. Research has found that low-SES students do not perform as well as those students who are not considered low SES (Spencer & Castano, 2007). Boaler (2003) indicated that low-SES students often times feel labeled and say they do not perform as well on assessments because of an assumption that they will not do well. Boaler suggests the label of “underperforming” can be detrimental to schools because students begin to see themselves as not worthy of doing well, even though their daily performance may be high. Boaler also found that low-SES students might not understand the vocabulary or terminology used in standardized tests or they might not understand the examples given because they have not been exposed to those life experiences. This limited understanding places them at a disadvantage compared to students who are not considered low-SES. Conclusion Assessments play an important role in education. Without assessments, educators would not be able to determine what students have learned from the lessons presented. Several types of assessments can be used to determine whether students have successfully learned the required content. Educators use formative and summative assessments to plan lessons, interventions, and activities, as well as to determine whether students have mastered the lesson objectives. While assessments have been a crucial part of educating students for many years, standardized testing has gained much attention in recent years. Standardized testing has garnered a negative perception because of the pressure that students, educators, and parents feel as results are published and schools are often graded based on test scores. Although standardized testing has shed negative thoughts on education, these exams are also a way of holding students and educators accountable for the learning that occurs in the classroom. 25 Chapter 3 METHOD OF PROCEDURE Chapter 3 describes the setting and participants in this study, the data collected, and the analyses performed. The researcher used data from the 2011 Texas Assessment of Knowledge and Skills (TAKS) scores in mathematics for third-grade students from a small suburban school district, Gold Independent School District (ISD; pseudonym), in southern central United States. Data were also collected from three administrations of the AIMSweb curriculum-based measure (CBM) in mathematics for the 2010–2011 school year and for sex, ethnicity, and socioeconomic status (SES) for those same third-grade students. For many years, evaluations in mathematics have revealed deficits in performance for students in the United States (NCES, 1992). Data from a 2009 NAEP report showed that gains in students’ average mathematics scores seen in earlier years did not continue from 2007 to 2009 in Grade 4, and the overall average score for fourth graders in 2009 was unchanged from the score in 2007. This report revealed that students did not make gains in mathematics, but performance also did not decline. However, according to a 2013 NAEP report, the average mathematics scores for fourth and eighth graders increased between 1990 to 2011, with a higher percentage of students performing at the proficient level. Mathematics achievement is evaluated using numerous methods. One of the most popular methods is a standardized multiple-choice form of testing. Educators analyze data from these tests and engage in decisions as to how instruction needs to be modified. Sex, ethnicity, and SES are also important factors to consider because many states, as in Texas, use these as a factor when determining accountability ratings for schools (TEA, 2011a). 26 Design of the Study The researcher conducted a quasi-experimental study to determine whether differences exist between the AIMSweb CBM (administered in fall, winter, and spring) and the TAKS mathematics tests, and whether differences exist due to sex, ethnicity, and SES. Data were obtained for the 2011 TAKS mathematics and AIMSweb CBM universal screeners in mathematics, as well as for sex, ethnicity, and SES and were analyzed using the statistics program Statistical Package for the Social Sciences (SPSS) version 22. The dependent variable was the TAKS mathematics scale score. The AIMSweb mathematics CBM screener tests places students into proficiency groups of well-above average, above average, average, below average, and well-below average. These proficiency groups, as well as sex, ethnicity, and SES, were the independent variables. The Kruskal-Wallis test of differences was appropriate for this study because it disaggregated the effects of the independent variables on the dependent variable and examined the unique contribution of each variable (Allison, 1999). Research Questions The following questions guided this study to determine whether differences exist in TAKS mathematics achievement among AIMSweb CBM mathematics proficiency levels, as well as by sex, ethnicity, and SES. 1. Do differences exist in TAKS mathematics scale scores based on AIMSweb proficiency levels for each administration of AIMSweb mathematics universal screeners? 2. Do differences exist in TAKS mathematics scale scores based on AIMSweb proficiency levels and sex for each administration of AIMSweb mathematics universal screeners? 27 3. Do differences exist in TAKS mathematics scale scores based on AIMSweb proficiency levels and SES for each administration of AIMSweb mathematics universal screeners? 4. Do differences exist in TAKS mathematics scale scores based on AIMSweb proficiency levels and ethnicity for each administration of AIMSweb mathematics universal screeners? Research Hypotheses The researcher tested the following hypotheses: H1: No statistically significant differences exist in TAKS mathematics scale scores based on AIMSweb proficiency levels for each administration of AIMSweb mathematics universal screeners. H2: No statistically significant differences exist in TAKS mathematics scale scores based on AIMSweb proficiency levels and sex for each administration of AIMSweb mathematics universal screeners. H3: No statistically significant differences exist in TAKS mathematics scale scores based on AIMSweb proficiency levels and SES for each administration of AIMSweb mathematics universal screeners. H4: No statistically significant differences exist in TAKS mathematics scale scores based on AIMSweb proficiency levels and ethnicity for each administration of AIMSweb mathematics universal screeners. 28 Instrumentation AIMSweb Universal Screener AIMSweb is an assessment, data organization, and reporting system that provides the framework and data necessary for Response to Intervention (RtI) and multi-tiered instruction (Pearson Education, 2010). The CBM is a standardized test in mathematics based on general outcome measurement principles that educators can use to evaluate student progress efficiently and accurately. Students are given the universal screener three times in a school year, and scores are used to plan instruction and interventions. AIMSweb can be used to assess mathematics skills and can be generalized to any curriculum. This mathematics CBM is a timed, 8-minute, open-ended, paper-based test that can be administered in groups or individually (Pearson Education, 2012). AIMSweb Mathematics Concepts and Applications (M-CAP) test was used in this study. AIMSweb test developers use the process described in this paragraph to design the test. Mathematics probes were designed to incorporate understanding of number concepts and operations, as well as mathematical concepts for each specific grade level. Writers with expertise in mathematics curricula developed the test items. After an internal review and editing, a group of mathematics teachers and other content experts reviewed the test and provided feedback. The test was then edited and revised based on the feedback received. The M-CAP then went through three pilot studies and a national field test to establish reliability. Based on the field test, the median alternate-form reliability was .86. Additionally, 60 cases from the field test were randomly selected to determine interrater reliability, which was .99 for third grade. Reliability was established by monitoring a sample of students 10 times over 29 the course of a school year. The reliability coefficient was .78 at Grade 3 based on a split-half measure from odd and even numbered probe administrations (Pearson Education, 2012). Students took the AIMSweb universal screener at the beginning, middle, and end of the school year. The screeners were administered at students’ current grade levels. Results from the screeners were manually entered into the AIMSweb system and were available immediately for reference. While the AIMSweb system offers a variety of reports, the one referenced in this study was the Scores and Percentile Report, which lists students in descending order based on proficiency groups (well-above average, above average, average, below average, and well-below average; Pearson Education, 2012). TAKS Mathematics Subtest The TAKS test was designed to measure the extent to which a student has learned and is able to apply the defined knowledge and skills at each tested grade level (TEA, 2011b). Evidence for the validity of the TAKS is based on and organized into the five categories: test content, response processes, internal structure, relations to other variables, and consequences of testing. The test development process is confidential; however, test development is measured in a matter of degree because test items are scaled and equated, not in an all-or-nothing matter, and it is an ongoing process. Reliability of the TAKS is measured with Kuder-Richardson 20 (KR20), which is used for multiple-choice tests. The coefficient alpha was .876 for the third grade complete mathematics subtest (TEA, 2011b). Sample Selection This study took place at a small suburban school district that consisted of nine elementary campuses in the southern central United States. The district comprised of approximately 8,000 students, of which 11% were African American; 22% Hispanic; 63% White; and 4% American 30 Indian, Asian, Pacific Islander, or multiracial. Additionally, 27% of students in the district were economically disadvantaged, 5% limited English proficient (LEP), and 25% at risk for failure. The AIMSweb assessment used in this study was in its first year of implementation in the selected school district with the universal screener in mathematics given three times during the year (fall, winter, and spring). Data from all three administrations of the test for third-grade students who met the study criteria were included in the analyses. Data Gathering Data were collected from three administered AIMSweb mathematics CBM and the TAKS mathematics test for third-grade students enrolled in Gold ISD. Data were also collected for sex, ethnicity, and SES. Students who took an alternate form of the TAKS (TAKS-Alternative, TAKS-Modified, or TAKS-Accommodated) were excluded from the study. At the time of this study, approximately 600 third-grade students were enrolled in the district, with 450 included in the study. The school district and university Institutional Review Board (IRB) granted permission to conduct the study prior to data collection. Data were provided by the school district for the 2011 TAKS mathematics, the three administrations (fall, winter, and spring) of the AIMSweb CBM for the 2010–2011 school year, and student sex, ethnicity, and SES. All data were anonymous and unidentifiable to the researcher. Data were provided in electronic form using a spreadsheet that contained all information necessary to address each variable. All data were stored on the sole researcher’s personal computer and backed up on a password-protected flash drive and external hard drive. The researcher will destroy all paper and electronic data 3 years after completing the requirements of the dissertation. 31 Treatment of Data Upon receiving the data spreadsheet from the district representative, data were entered into SPSS version 22 for analysis. Before the data analyses were conducted, the researcher computed residual scores to remove the effects of sex, ethnicity, and SES as needed for each research question (Cohen, Cohen, West, & Aiken, 2003). This was done using multiple regression analyses with the TAKS mathematics scale score as the dependent variable and the relevant combination of sex, ethnicity, and SES as the independent variables. In essence, this procedure statistically removed the effects of variables that could confound the results (Cohen et al., 2003). The residual scores were saved and converted back to a TAKS-like score with similar means and standard deviations. The new residual scores were used as the dependent variable in the analyses. The researcher used a Kruskal-Wallis test of differences between several independent groups as the nonparametric alternative to the one-way analysis of variance (ANOVA) because the sample sizes for the groups were disparate and violated the assumptions of relatively equal sample sizes and homogeneity of variance (Field, 2013). All possible comparison post hoc tests were computed to assess differences using p-values adjusted for the number of comparisons to address each research question. Effect sizes were calculated for each comparison by dividing the standardized test statistic (z-score) by the square root of the total number of students in the comparison (Field, 2013). The researcher computed three separate analyses, one for each administration of the AIMSweb mathematics universal screener in the fall, winter, and spring. The dependent variable for each analysis was the residualized version of the TAKS mathematics scale score. Although all possible comparisons were computed, only comparisons of the independent variables for each 32 hypotheses at the same AIMSweb proficiency levels are reported in the tables and discussion because these were the primary groups of interest. For Research Question 1, the TAKS mathematics score was adjusted for sex, ethnicity, and SES. This process statistically removed the effects of the three demographic variables from the mathematics scale score (Cohen et al., 2003). The independent variable was the proficiency level established by the AIMSweb screener: well-above average, above average, average, below average, and well-below average. For Research Question 2, the TAKS mathematics score was adjusted for ethnicity and SES, statistically removing the effects of these two demographic variables from the mathematics scale score, but leaving the effects of sex (Cohen et al., 2003). A new grouping variable was computed to account for sex at each proficiency level established by the AIMSweb screener. The following 10 groups were created: well-above average males, well-above average females, above average males, above average females, average males, average females, below average males, below average females, well-below average males, and well-below average females. For Research Question 3, separate analyses were computed for each administration of the AIMSweb mathematics universal screener in the fall, winter, and spring. The dependent variable for each analysis was the residualized version of the TAKS mathematics scale score. The TAKS mathematics score was adjusted for ethnicity and sex, statistically removing the effects of these two demographic variables from the mathematics scale score, but leaving the effects of SES (Cohen et al., 2003). A new grouping variable was computed to account for SES at each proficiency level established by the AIMSweb screener. Ten groups were created, including well-above average disadvantaged, well-above average not disadvantaged, above average disadvantaged, above average not disadvantaged, average disadvantaged, average not 33 disadvantaged, below average disadvantaged, below average not disadvantaged, well-below average disadvantaged, and well-below average not disadvantaged. For Research Question 4, separate analyses were computed for each administration of the AIMSweb mathematics universal screener in the fall, winter, and spring. The dependent variable for each analysis was the residualized version of the TAKS mathematics scale score. The TAKS mathematics score was adjusted for sex and SES, statistically removing the effects of these two demographic variables from the mathematics scale score, but leaving the effects of ethnicity (Cohen et al., 2003). Because a larger percentage of students in this district were White (63%), ethnicity was disaggregated into two groups, White and other. Other included all students with any reported ethnicity other than White. A new grouping variable was computed to account for ethnicity at each proficiency level established by the AIMSweb screener. Ten groups were created, including well-above average other, well-above average White, above average other, above average White, average other, average White, below average other, below average White, well-below average other, and well-below average White. Summary To answer the research questions, data were obtained for third-grade students who took the TAKS mathematics and AIMSweb CBM in mathematics, as well as for sex, ethnicity, and SES. Data were analyzed to determine whether differences existed between AIMSweb CBM in mathematics and TAKS mathematics achievement overall, as well as by sex, ethnicity, and SES. Kruskal-Wallis tests of differences between several independent groups were used as the nonparametric alternative to the one-way ANOVA because sample sizes for the groups were disparate. 34 Chapter 4 PRESENTATION OF FINDINGS The purpose of this study was to determine whether differences existed on the Texas Assessment of Knowledge and Skills (TAKS) results compared to AIMSweb screener results for each administration of the test (fall, winter, and spring), while also considering ethnicity, socioeconomic status (SES), and sex for third-grade students. Mathematics TAKS and AIMSweb data were examined for third-grade students across the school district for the 2010–2011 school year. Students who took an alternate form of TAKS mathematics were eliminated from the study. In this quantitative study, the Kruskal-Wallis test of differences was used to answer the research questions. Sex, ethnicity, and SES were also analyzed to determine their influence on TAKS results in mathematics. Research Question 1 Research Question 1: Do differences exist in TAKS mathematics scale scores based on AIMSweb proficiency levels for each administration of AIMSweb mathematics universal screeners? Fall Administration The well-above average group for the fall administration of the universal screener had the highest mean score on TAKS mathematics (M = 650.97, SD = 100.33). The well-below average group had the lowest mean score (M = 531.52, SD = 83.74); effects of sex, SES, and ethnicity were removed. Mean scores and standard deviations for each group are listed in Table 1. 35 Table 1 Means and Standard Deviations for TAKS Math Scale Scores by AIMSweb Group in Fall 2010 Group n M SD Well-above average 22 650.97 100.33 Above average 50 635.15 72.82 Average 134 615.36 86.92 Below average 178 584.31 87.54 Well-below average 79 531.52 83.74 Because group sizes were so disproportionate, the Kruskal-Wallis test of differences between several independent groups was conducted using p < .05 as the significance level (Field, 2013). The H was significant after removing the effects of ethnicity, sex, and SES, which indicates that differences existed between the groups, H(4) = 63.39, p < .001. All possible comparison post hoc tests indicated that the well-above average, above average, and average groups scored significantly higher on the TAKS mathematics compared to the below average or well-below average groups. These differences were not significant (see Table 2). The below average and well-below average groups were significantly different from all other groups, including each other. The differences between above average and well-below average (r = .30) and average and well-below average (r = .30) yielded medium effect sizes, with the AIMSweb group accounting for 9% of the variance (Cohen, 1988). All other effect sizes were small. 36 Table 2 All Possible Comparisons for TAKS Math Scale Scores by AIMSweb Group in Fall 2010 Sample 1 Sample 2 Standardized Test Statistic Standard Error Adjusted p r Well-above average Above average 0.39 34.22 1.000 .02 Well-above average Average 1.59 30.77 1.000 .07 Well-above average Below average 3.07 30.23 .022 .14 Well-above average Well-below Average 5.21 32.25 < .001 .24 Above average Average 1.61 22.17 1.000 .07 Above average Below average 3.71 21.41 < .001 .17 Above average Well-below average 6.40 24.17 < .001 .30 Average Below average 2.86 15.30 .042 .13 Average Well-below average 6.40 19.00 < .001 .30 Below average Well-below average 4.17 18.08 < .001 .19 Note. The standardized test statistic was a z-score. The p value was adjusted for the number of comparisons. Winter Administration The well-above average group for the winter administration of the universal screener had the highest mean score on TAKS mathematics (M = 657.39, SD = 79.06). The well-below average group had the lowest mean score (M = 538.53, SD = 87.77). Mean scores for each group are listed in Table 3. The effects of sex, SES, and ethnicity were removed. 37 Table 3 Means and Standard Deviations for TAKS Math Scale Scores by AIMSweb Group in Winter 2011 Group n M SD Well-above average 29 657.39 79.06 Above average 67 649.10 83.15 Average 133 600.44 81.60 Below average 144 582.47 87.77 Well-below average 95 538.84 87.77 Because group sizes were so disproportionate, the Kruskal-Wallis test of differences between several independent groups was conducted. The H was significant after removing the effects of ethnicity, sex, and SES, which indicated differences among the groups, H(4) = 72.24, p < .001. All possible comparison post hoc tests indicated that the well-above average group was significantly different from all groups except the above average (see Table 4). The below average and average groups were significantly different from all other groups, but not each other. The differences between above average and well-below average (r = .30) had a medium effect size, with the AIMSweb group accounting for 12% of the variance (Cohen, 1988). All other effect sizes were small. 38 Table 4 Results from All Possible Comparisons for TAKS Math Scale Scores by AIMSweb Group in Winter 2011 Sample 1 Sample 2 Standardized Test Statistic Standard Error Adjusted p r Well-above average Above average 0.43 30.06 1.000 .02 Well-above average Average 2.96 27.71 .031 .14 Well-above average Below average 3.97 27.52 .006 .18 Well-above average Well-below average 5.98 28.69 < .001 .28 Above average Average 3.41 20.26 .006 .16 Above average Below average 4.81 20.00 < .001 .22 Above average Well-below average 7.36 21.57 < .001 .34 Average Below average 1.66 16.26 .096 .08 Average Well-below average 4.93 18.16 < .001 .23 Below average Well-below average 3.49 17.87 .005 .16 Note. The standardized test statistic is a z-score. The p value was adjusted for the number of comparisons. Spring Administration The well-above average group for the spring administration of the universal screener had the highest mean score on TAKS mathematics (M = 639.12, SD = 83.10). The well-below average group had the lowest mean score (M = 547.63, SD = 77.37). Mean scores for each group are listed Table 5. The effects of sex, SES, and ethnicity were removed. 39 Table 5 Means and Standard Deviations for TAKS Math Scale Scores by AIMSweb Group in Spring 2011 Group n M SD Well-above average 32 639.12 83.10 Above average 72 636.70 79.05 Average 152 600.59 87.54 Below average 136 574.85 97.30 Well-below average 80 547.63 77.37 Because group sizes were so disproportionate, the Kruskal-Wallis test of differences between several independent groups was conducted. The H was significant after removing the effects of ethnicity, sex, and SES, which indicated differences between the groups, H(4) = 50.59, p < .001. All possible comparison post hoc tests indicated that the well-above average was significantly different from all groups except the above average and average (see Table 6). The below average and average groups were significantly different from all other groups, but not each other. The differences between all group comparisons yielded small effect sizes (Cohen, 1988). 40 Table 6 Results from All Possible Comparisons for TAKS Math Scale Scores by AIMSweb Group in Spring 2011 Sample 1 Sample 2 Standardized Test Statistic Standard Error Adjusted p r Well-above average Above average 0.19 28.97 1.000 .01 Well-above average Average 2.37 26.52 .180 .11 Well-above average Below average 3.53 26.79 .004 .16 Well-above average Well-below average 4.92 28.52 < .001 .23 Above average Average 2.94 19.51 .003 .14 Above average Below average 4.49 19.88 < .001 .21 Above average Well-below average 6.08 22.15 < .001 .28 Average Below average 1.98 16.20 .474 .09 Average Well-below average 4.12 18.84 < .001 .19 Below average Well-below average 2.38 19.21 .175 .11 Note. The standardized test statistic is a z-score. The p value was adjusted for the number of comparisons. Null Hypothesis The null hypothesis for Research Question 1 was, “No statistically significant differences exist in TAKS mathematics scale scores based on AIMSweb proficiency levels for each administration of AIMSweb mathematics universal screeners.” Because significant differences existed in TAKS mathematics scale scores in the fall, winter, and spring administrations of the universal screeners after removing the effects of sex, SES, and ethnicity, the researcher rejected the null hypothesis. 41 Research Question 1 established that differences existed by AIMSweb groups for all three administrations. The interest of Research Questions 2-4 was whether these differences were due to sex, SES, or ethnicity, respectively, at each specific level of the AIMSweb results. The following sections report the analysis for these variables. Research Question 2 Research Question 2: Do differences exist in TAKS mathematics scale scores based on AIMSweb proficiency levels and sex for each administration of AIMSweb mathematics universal screeners? Fall Administration The total number of students assessed during the fall administration was 463; 49.5% (n = 229) male and 50.5% (n = 234) female. The sex group sizes were comparable. The numbers, means, and standard deviations for each proficiency group are shown in Table 7. Scores were adjusted for ethnicity and SES, with only the effects for AIMSweb group and sex remaining. Because group sizes were so disproportionate, the Kruskal-Wallis test of differences between several independent groups was conducted. A new grouping variable was computed to combine the effects of AIMSweb group and sex, creating 10 new groups. The H was significant after removing the effects of ethnicity and SES, which indicated differences between the groups, H(9) = 74.63, p < .001. Although all possible comparisons were computed, only those comparing males and females within each AIMSweb group are reported. All possible comparison post hoc tests indicated that no differences existed in sex within AIMSweb group (see Table 8). The differences between all group comparisons yielded small effect sizes. 42 Table 7 Means and Standard Deviations for TAKS Math Scale Scores by AIMSweb Group and Sex in Fall 2010 Males Females Group n M SD n M SD Well-above average 12 699.12 65.62 10 594.38 114.93 Above average 26 640.45 62.79 24 630.10 83.24 Average 72 621.52 85.19 62 609.30 88.97 Below average 80 592.46 83.41 98 576.23 90.59 Well-below average 39 526.25 68.81 40 536.70 94.66 Table 8 Results from All Possible Comparisons for TAKS Math Scale Scores by AIMSweb Group and Sex in Fall 2010 Sample 1 (Male) Sample 2 (Female) Standardized Test Statistic Standard Error Adjusted p r Well-above average Well-above average -2.67 57.24 .346 .12 Above average Above average -0.42 37.84 1.000 .02 Average male Average -0.79 23.16 1.000 .04 Below average Below average -1.23 20.14 1.000 .06 Well-below average Well-below average f 0.78 30.08 1.000 .04 Note. The standardized test statistic is a z-score. The p value was adjusted for the number of comparisons. 43 Winter Administration The total number of students assessed during the winter administration was 468; 50% (n = 232) were male and 50% (n = 236) were female. The sex group sizes were almost equal. The numbers, means, and standard deviations for each proficiency group are detailed in Table 9. Scores were adjusted for ethnicity and SESs, with only the effects for AIMSweb group and sex remaining. Table 9 Means and Standard Deviations for TAKS Math Scale Scores by AIMSweb Group and Sex in Winter 2011 Male Female Group n M SD n M SD Well-above average 17 671.47 68.96 12 640.79 93.27 Above average 30 645.60 86.43 37 650.22 79.69 Average 63 610.19 80.05 70 590.80 82.89 Below average 74 588.50 83.05 70 576.54 92.45 Well-below average 48 551.53 87.73 47 525.67 87.99 Because group sizes were so disproportionate, the Kruskal-Wallis test of differences between several independent groups was conducted. A new grouping variable was computed to combine the effects of AIMSweb group and sex, creating 10 groups. The H was significant after removing the effects of ethnicity and SES, which indicated differences between the groups, H(9) = 77.38, p < .001. Although all 40 possible comparisons were computed, only those comparing 44 males and females within each AIMSweb group are reported in Table 10. All possible comparison post hoc tests indicated that no differences existed by sex within AIMSweb groups. The differences between all group comparisons yielded small effect sizes. Table 10 Results from All Possible Comparisons for TAKS Math Scale Scores by AIMSweb Group and Sex in Winter 2011 Sample 1 (Male) Sample 2 (Female) Standardized Test Statistic Standard Error Adjusted p r Well-above average Well-above average -1.09 50.95 1.000 .05 Above average Above average 0.76 33.20 1.000 .04 Average Average -1.45 23.68 1.000 .07 Below average Below average -0.55 22.53 1.000 .03 Well-below average Well-below average -1.10 27.73 1.000 .05 Note. The standardized test statistic is a z-score. The p value was adjusted for the number of comparisons. Spring Administration The total number of students assessed during in the spring administration was 472; 50% (n = 236) were male and 50% (n = 236) were female. The sex group sizes were equal. The numbers, means, and standard deviations for each proficiency group are shown in Table 11. Scores were adjusted for ethnicity and SES, with only the effects for AIMSweb group and sex remaining. 45 Table 11 Means and Standard Deviations for TAKS Math Scale Scores by AIMSweb Group and Sex in Spring 2011 Males Females Group n M SD n M SD Well-above average 15 643.37 73.20 17 634.63 92.76 Above average 38 663.40 74.26 34 607.72 79.17 Average 71 605.40 77.52 81 595.42 95.29 Below average 68 586.45 93.85 68 563.08 100.40 Well-below average 44 541.60 70.73 36 557.02 82.02 Because group sizes were so disproportionate, the Kruskal-Wallis test of differences between several independent groups was conducted. A new grouping variable was computed to combine the effects of AIMSweb group and sex, creating 10 groups. The H was significant after removing the effects of ethnicity and SES, which indicated differences between the groups, H(9) = 59.72, p < .001. Although all 40 possible comparisons were computed, only those comparing males and females within each AIMSweb group are reported in Table 12. All possible comparison post hoc tests indicated that no differences existed by sex within AIMSweb groups. The differences between all group comparisons yielded small effect sizes. 46 Table 12 Results from All Possible Comparisons for TAKS Math Scale Scores by AIMSweb Group and Sex in Winter 2011 Sample 1 (Male) Sample 2 (Female) Standardized Test Statistic Standard Error Adjusted p r Well-above average Well-above average -0.53 48.28 1.000 .02 Above average Above average -2.46 32.17 0.629 .11 Average Average -0.77 22.16 1.000 .04 Below average Below average 0.21 23.37 1.000 .01 Well-below average Well-below average 0.87 30.63 1.000 .04 Note. The standardized test statistic is a z-score. The p value was adjusted for the number of comparisons. Null Hypothesis The null hypothesis for Research Question 2 was “No statistically significant differences exist in TAKS mathematics scale scores based on AIMSweb proficiency levels and sex for each administration of AIMSweb mathematics universal screeners.” Because no significant differences existed between sex and AIMSweb proficiency level groups when assessing TAKS mathematics scale scores in the fall, winter, or spring administrations of the universal screeners after removing the effects of SES and ethnicity, the researcher accepted the null hypothesis. Research Question 3 Research Question 3: Do differences exist in TAKS mathematics scale scores based on AIMSweb proficiency levels and SES for each administration of AIMSweb mathematics universal screeners? 47 Fall Administration The total number of students assessed during the fall administration was 463; 26% (n = 121) were economically disadvantaged and 74% (n = 342) were not economically disadvantaged. Because there were fewer economically disadvantaged students in the sample, the AIMSweb proficiency groups were not equally distributed by SES, particularly for the well-above average and above average groups. The numbers, means, and standard deviations for each proficiency group are shown in Table 13. Scores were adjusted for ethnicity and sex, with only the effects for AIMSweb group and SES remaining. Table 13 Means and Standard Deviations for TAKS Math Scale Scores by AIMSweb Group and SES in Fall 2010 Disadvantaged Not Disadvantaged Group n M SD n M SD Well-above average 19 664.50 104.81 3 598.64 61.11 Above average 41 637.28 72.49 9 631.08 60.99 Average 107 623.83 80.61 27 592.26 103.22 Below average 123 591.26 89.17 55 564.08 79.10 Well-below average 52 540.97 88.61 27 510.50 70.16 Because group sizes were so disproportionate, the Kruskal-Wallis test of differences between several independent groups was conducted. A new grouping variable was computed to combine the effects of AIMSweb group and SES, creating 10 groups. The H was significant 48 after removing the effects of ethnicity and sex, which indicated differences between the groups, H(9) = 78.94, p < .001. Although all 40 possible comparisons were computed, only those comparing economically disadvantaged and not economically disadvantaged within each AIMSweb group are reported in Table 14. All possible comparison post hoc tests indicated that no differences existed by SES within AIMSweb group. The differences between all group comparisons yielded very small effect sizes. Table 14 Results from All Possible Comparisons for TAKS Math Scale Scores by AIMSweb Group and Socioeconomic Status in Fall 2010 Sample 1 (Disadvantaged) Sample 2 (Not Disadvantaged) Standardized Test Statistic Standard Error Adjusted p r Well-above average Well-above average 0.95 83.10 1.000 .07 Above average Above average -0.02 49.24 1.000 .00 Average Average 1.81 28.81 1.000 .08 Below average Below average .089 21.70 1.000 .00 Well-below average Well-below average 1.40 31.73 1.000 .07 Note. The standardized test statistic is a z-score. The p value was adjusted for the number of comparisons. Winter Administration The total number of students assessed during the winter administration was 468; 27% (n = 125) were economically disadvantaged and 73% (n = 343) were not economically disadvantaged. Because there were fewer economically disadvantaged students in the sample, the AIMSweb proficiency groups were not equally distributed by SES, particularly for the well-49 above average groups. The numbers, means, and standard deviations for each proficiency group are shown in Table 15. Scores were adjusted for ethnicity and sex, with only the effects for AIMSweb group and SES remaining. Table 15 Means and Standard Deviations for TAKS Math Scale Scores by AIMSweb Group and SES in Winter 2011 Disadvantaged Not Disadvantaged Group n M SD n M SD Well-above average 27 671.24 78.34 2 548.11 47.78 Above average 52 650.09 82.73 15 646.49 76.33 Average 99 609.23 78.69 34 575.30 87.97 Below average 99 593.24 91.46 45 553.28 73.48 Well-below average 66 546.58 86.14 29 521.46 85.07 Because group sizes were so disproportionate, the Kruskal-Wallis test of differences between several independent groups was conducted. A new grouping variable was computed to combine the effects of AIMSweb group and SES, creating 10 groups. The H was significant after removing the effects of ethnicity and sex, which indicated differences between the groups, H(9) = 89.32, p < .001. Although all 40 possible comparisons were computed, only those comparing economically disadvantaged and not economically disadvantaged within each AIMSweb group are reported in Table 16. All possible comparison post hoc tests indicated that 50 no differences existed by SES within AIMSweb groups. The differences between all group comparisons yielded small or very small effect sizes. Table 16 Results from All Possible Comparisons for TAKS Math Scale Scores by AIMSweb Group and SES in Winter 2011 Sample 1 (Disadvantaged) Sample 2 (Not Disadvantaged) Standardized Test Statistic Standard Error Adjusted p r Well-above average Well-above average 1.89 99.01 1.000 .01 Above average Above average -0.06 39.62 1.000 .00 Average Average 1.92 26.87 1.000 .09 Below average Below average 2.43 24.31 0.681 .11 Well-below average Well-below average 0.27 98.84 1.000 .01 Note. The standardized test statistic is a z-score. The p value was adjusted for the number of comparisons. Spring Administration The total number of students assessed during the spring administration was 472; 26% (n = 125) were economically disadvantaged and 74% (n = 347) were not economically disadvantaged. Because there were fewer economically disadvantaged students in the sample, the AIMSweb proficiency groups were not equally distributed by SES, particularly for the well-above average groups. The numbers, means, and standard deviations for each proficiency group are shown in Table 17. Scores were adjusted for ethnicity and sex, with only the effects for AIMSweb group and SES remaining. 51 Table 17 Means and Standard Deviations for TAKS Math Scale Scores by AIMSweb Group and SES in Spring 2011 Not Disadvantaged Disadvantaged Group n M SD n M SD Well-above average 29 648.76 80.24 3 589.78 111.00 Above average 57 644.28 79.30 15 611.61 76.54 Average 118 614.83 88.65 34 556.04 78.60 Below average 95 575.48 92.72 41 573.26 96.90 Well-below average 48 549.05 73.47 32 534.84 78.69 Because group sizes were so disproportionate, the Kruskal-Wallis test of differences between several independent groups was conducted. A new grouping variable was computed to combine the effects of AIMSweb group and SES, creating 10 groups. The H was significant after removing the effects of ethnicity and sex, which indicated differences between the groups, H(9) = 62.03, p < .001. Although all 40 possible comparisons were computed, only those comparing economically disadvantaged and not economically disadvantaged within each AIMSweb group are reported in Table 18. All possible comparison post hoc tests indicated that no differences existed by SES within AIMSweb groups. The differences between all group comparisons yielded small effect sizes. 52 Table 18 Results from All Possible Comparisons for TAKS Math Scale Scores by AIMSweb Group and SES in Spring 2011 Sample 1 (Disadvantaged) Sample 2 (Not Disadvantaged) Standardized Test Statistic Standard Error Adjusted p r Well-above average Well-above average -0.51 41.58 1.000 .06 Above average Above average -1.91 32.19 1.000 .09 Average Average 0.10 22.17 1.000 .00 Below average Below average -0.08 23.38 1.000 .00 Well-below average Well-below average 1.38 30.64 1.000 .06 Note. The standardized test statistic is a z-score. The p value was adjusted for the number of comparisons. Null Hypothesis The null hypothesis for Research Question 3 was “No statistically significant differences exist in TAKS mathematics scale scores based on AIMSweb proficiency levels and SES for each administration of AIMSweb mathematics universal screeners.” Because no significant differences existed between SES and AIMSweb proficiency levels when assessing TAKS mathematics scale scores in the fall, winter, or spring administrations of the universal screeners after removing the effects of sex and ethnicity, the researcher accepted the null hypothesis. Research Question 4 Research Question 4: Do differences exist in TAKS mathematics scale scores based on AIMSweb proficiency levels and ethnicity for each administration of AIMSweb mathematics universal screeners? The selected district had more White students than African American, Hispanic, or students of other ethnicities; therefore, group sizes were disparate when 53 disaggregating data by student ethnic group. For this reason, students were classified as White or other for the following analyses to lessen the disparity in group sizes. Fall Administration The total number of students assessed during the fall administration was 463; 60% (n = 280) were White and 40% (n = 183) were classified as other. Because fewer students were classified as other in the sample, the AIMSweb proficiency groups were not equally distributed by ethnicity, particularly for the above average, average, and below average groups. The numbers, means, and standard deviations for each proficiency group are shown in Table 19. Scores were adjusted for sex and SES, with only the effects for AIMSweb group and ethnicity remaining. Table 19 Means and Standard Deviations for TAKS Math Scale Scores by AIMSweb Group and Ethnicity in Fall 2010 White Other Group n M SD n M SD Well-above average 14 651.85 93.88 8 649.51 114.40 Above average 38 643.94 76.14 12 617.75 63.46 Average 85 623.81 88.91 49 601.39 82.44 Below average 102 588.97 91.27 76 575.93 81.69 Well-below average 41 535.90 78.53 38 524.46 88.33 54 Because group sizes were so disproportionate, the Kruskal-Wallis test of differences between several independent groups was conducted. A new grouping variable was computed to combine the effects of AIMSweb group and ethnicity, creating 10 groups. The H was significant after removing the effects of SES and sex, which indicated differences between the groups, H(9) = 77.50, p < .001. Although all 40 possible comparisons were computed, only those comparing White students and students of other ethnicities within each AIMSweb group are reported in Table 20. All possible comparison post hoc tests indicated that no differences existed by ethnicity within AIMSweb groups. The differences between all group comparisons yielded very small effect sizes. Table 20 Results from All Possible Comparisons for TAKS Math Scale Scores by AIMSweb and Ethnicity in Fall 2010 Sample 1 (Other) Sample 2 (White) Standardized Test Statistic Standard Error Adjusted p r Well-above Average Well-above Average -2.39 57.27 0.763 .06 Above Average Above Average 0.84 37.86 1.000 .04 Average Average 0.26 23.17 1.000 .01 Below Average Below Average -0.09 20.15 1.000 .00 Well-below Average Well-below Average 1.34 30.10 1.000 .06 Note. The standardized test statistic is a z-score. The p value was adjusted for the number of comparisons. 55 Winter Administration The total number of students assessed during the winter administration was 468; 61% (n = 284) were White and 39% (n = 184) were classified as other. Because fewer students were classified as other in the sample, the AIMSweb proficiency groups were not equally distributed by ethnicity. The numbers, means, and standard deviations for each proficiency group are shown in Table 21. Scores were adjusted for sex and SES, with only the effects for AIMSweb group and ethnicity remaining. Table 21 Means and Standard Deviations for TAKS Math Scale Scores by AIMSweb Group and Ethnicity in Winter 2011 White Other Group n M SD n M SD Well-above average 23 656.93 64.10 6 671.93 121.12 Above average 44 648.16 81.98 23 652.20 84.17 Average 81 600.05 80.24 52 600.19 80.55 Below average 87 592.69 96.85 57 568.15 70.47 Well-below average 49 554.73 100.15 46 517.23 71.86 Because group sizes were so disproportionate, the Kruskal-Wallis test of differences between several independent groups was conducted. A new grouping variable was computed to combine the effects of AIMSweb group and ethnicity, creating 10 groups. The H was significant after removing the effects of SES and sex, which indicated differences between the groups, H(9) 56 = 78.51, p < .001. Although all 40 possible comparisons were computed, only those comparing White students and students of other ethnicities within each AIMSweb group are reported in Table 22. All possible comparison post hoc tests indicated that no differences existed by ethnicity within AIMSweb groups. The differences between all group comparisons yielded very small effect sizes. Table 22 Results from All Possible Comparisons for TAKS Math Scale Scores by AIMSweb and Ethnicity in Winter 2011 Sample 1 (Other) Sample 2 (White) Standardized Test Statistic Standard Error Adjusted p r Well-above average Well-above average 0.58 61.98 1.000 .04 Above average Above average 0.35 34.79 1.000 .02 Average Average 0.17 24.02 1.000 .01 Below average Below average 0.96 23.04 1.000 .04 Well-below average Well-below average -0.82 27.76 1.000 .04 Note. The standardized test statistic is a z-score. The p value was adjusted for the number of comparisons. Spring Administration The total number of students assessed during the spring administration was 470; 61% (n = 288) were White and 39% (n = 184) were classified as other. Because fewer students were classified as other in the sample, the AIMSweb proficiency groups were not equally distributed by ethnicity, particularly for the average group. The numbers, means, and standard deviations 57 for each proficiency group are shown in Table 23. Scores were adjusted for sex and SES, with only the effects for AIMSweb group and ethnicity remaining. Table 23 Means and Standard Deviations for TAKS Math Scale Scores by AIMSweb Group and Ethnicity in Spring 2011 White Other Group n M SD n M SD Well-above average 26 648.16 84.86 6 616.42 74.85 Above average 50 638.77 72.65 22 636.53 90.95 Average 100 604.41 90.61 52 595.92 78.78 Below average 67 580.54 102.40 69 563.92 92.30 Well-below average 45 552.84 77.66 35 541.51 77.34 Because group sizes were so disproportionate, the Kruskal-Wallis test of differences between several independent groups was conducted. A new grouping variable was computed to combine the effects of AIMSweb group and ethnicity, creating 10 groups. The H was significant after removing the effects of SES and sex, which indicated differences between the groups, H(9) = 58.35, p < .001. Although all 40 possible comparisons were computed, only those comparing White students and students of other ethnicities within each AIMSweb group are reported in Table 24. All possible comparison post hoc tests indicated that no differences existed by ethnicity within AIMSweb groups. The differences between all group comparisons yielded very small effect sizes. 58 Table 24 Results from All Possible Comparisons for TAKS Math Scale Scores by AIMSweb and Ethnicity in Spring 2011 Sample 1 (Other) Sample 2 (White) Standardized Test Statistic Standard Error Adjusted p r Well-above average Well-above average 0.11 61.75 1.000 .01 Above average Above average 0.51 34.55 1.000 .02 Average Average 0.29 23.31 1.000 .01 Below average Below average 0.59 23.39 1.000 .03 Well-below average Well-below average 0.16 30.73 1.000 .01 Note. The standardized test statistic is a z-score. The p value was adjusted for the number of comparisons. Null Hypothesis The null hypothesis for Research Question 4 was “No statistically significant differences exist in TAKS mathematics scale scores based on AIMSweb proficiency levels and ethnicity for each administration of AIMSweb mathematics universal screeners.” Because no significant differences existed between ethnicity and AIMSweb proficiency levels when assessing TAKS mathematics scale scores in the fall, winter, or spring administrations of the universal screeners after removing the effects of sex and SES, the researcher accepted the null hypothesis. Summary Kruskal-Wallis tests of differences between several independent groups were conducted to test the research hypotheses. The researcher failed to accept Hypothesis 1. Significant differences were found between TAKS mathematics scale scores and AIMSweb mathematics for the three administrations of the universal screener the groups after removing the effects of 59 ethnicity, sex, and SES. Additionally, the researcher failed to reject Hypotheses 2-4. No significant differences were found between each AIMSweb group for sex, ethnicity, or SES, after controlling for the other variables respectively. 60 Chapter 5 SUMMARY OF THE STUDY AND THE HYPOTHESES FINDINGS, CONCLUSIONS, IMPLICATIONS, RECOMMENDATIONS FOR FUTURE RESEARCH, AND SUMMARY The preceding chapters presented the review of the literature, methods, data collected, and data analysis of this study. Chapter 5 includes an explanation of the data, conclusions from that data, and findings that address implications for educators. Future research is suggested based on the findings of this study. Summary of the Study The purpose of this study was to determine whether differences existed in Texas Assessment of Knowledge and Skills (TAKS) results compared to AIMSweb screener results for each administration of the test (fall, winter, and spring), while also considering ethnicity, socioeconomic status (SES), and sex for third-grade students. AIMSweb is a web-based tool that provides multiple curriculum-based measurement (CBM) assessments for universal screenings given three times a year. The assessment measures general mathematics problem-solving skills expected in Grades 2-8. The focus of this study was third-grade students. Data from the AIMSweb screeners help teachers plan lessons, activities, or interventions to better meet students’ academic needs. The researcher examined data from the TAKS and AIMSweb mathematics tests for third-grade students across the school district for the 2010–2011 school year. Students who took an alternate form of TAKS mathematics were eliminated from the study. In this quantitative study, the Kruskal-Wallis test of differences was used to answer the research questions. Sex, ethnicity, and SES were also analyzed to determine whether significant differences existed between the TAKS and AIMSweb mathematics tests. 61 Summary of Findings The null hypothesis for Research Question 1 was “No statistically significant differences exist in TAKS mathematics scale scores based on AIMSweb proficiency levels for each administration of AIMSweb mathematics universal screeners.” Significant differences existed in TAKS mathematics scale scores on the fall, winter, and spring administrations of the universal screeners after removing the effects of sex, SES, and ethnicity. Therefore, the researcher rejected the null hypothesis. The null hypothesis for Research Question 2 was “No statistically significant differences exist in TAKS mathematics scale scores based on AIMSweb proficiency levels and sex for each administration of AIMSweb mathematics universal screeners.” No significant differences were found between sex and AIMSweb proficiency level groups when assessing TAKS mathematics scale scores on the fall, winter, or spring administrations of the universal screeners after removing the effects of SES and ethnicity. Therefore, the researcher accepted the null hypothesis. The null hypothesis for Research Question 3 was “No statistically significant differences exist in TAKS mathematics scale scores based on AIMSweb proficiency levels and SES for each administration of AIMSweb mathematics universal screeners.” No significant differences were found between SES and AIMSweb proficiency levels when assessing TAKS mathematics scale scores on the fall, winter, or spring administrations of the universal screeners after removing the effects of sex and ethnicity. Therefore, the researcher accepted the null hypothesis. The null hypothesis for Research Question 4 was “No statistically significant differences exist in TAKS mathematics scale scores based on AIMSweb proficiency levels and ethnicity for each administration of AIMSweb mathematics universal screeners.” No significant differences 62 existed between ethnicity and AIMSweb proficiency levels when assessing TAKS mathematics scale scores on the fall, winter, or spring administrations of the universal screeners, after removing the effects of sex and SES. Therefore, the researcher accepted the null hypothesis. Conclusions Research Question 1 Research Question 1 was, “Do differences exist in TAKS mathematics scale scores based on AIMSweb proficiency levels for each administration of AIMSweb mathematics universal screeners?” The researcher conducted a Kruskal-Wallis test of differences to examine differences between TAKS mathematics scale scores and AIMSweb mathematics for the three administrations of the universal screener. Significant differences existed between the TAKS mathematics scale scores and AIMSweb screener after removing the effects of ethnicity, sex, and SES for the fall, winter, and spring administrations of the AIMSweb screener. These finding indicate that teachers should analyze the proficiency groups of AIMSweb screeners to meet students’ needs effectively and to ensure they are ready for the end-of-the year state assessment. Knowing that each administration of AIMSweb and TAKS mathematics scale scores yielded a significant difference supports the importance of understanding the different proficiency levels, especially lower ones and can help teachers provide extra support for students at those levels. Students in the below average and well-below average groups on the beginning and middle of the year assessments need to be monitored, tutored, and offered interventions to help them master the skills needed to perform at a satisfactory level for the end-of-year test. If students in the lower proficiency groups are not provided extra support, they will more than likely not perform satisfactorily on the end-of-year TAKS mathematics test. 63 Teachers need to set aside time after each AIMSweb administration to analyze the data. Often times, tests such as AIMSweb are given as a district expectation; however, very little is actually done with the results. The findings of this study indicate the importance of teachers targeting and providing extra support to students who do not perform well on these tests. Students who scored in the average range are not as much of a concern; however, monitoring their progress from one test to the next is important to ensure that they do not fall into the below average or well-below average levels. Students who performed at the above average or well-above average proficiency levels tend to meet satisfactory levels on the end-of-year TAKS mathematics test. Research Question 2 Research Question 2 was, “Do differences exist in TAKS mathematics scale scores based on AIMSweb proficiency levels and sex for each administration of AIMSweb mathematics universal screeners?” A Kruskal-Wallis test of differences was conducted to examine differences between TAKS mathematics scale scores based on AIMSweb proficiency levels and sex for each administration of the AIMSweb mathematics universal screeners. Significant differences existed on the TAKS and AIMSweb mathematics for the fall, winter, and spring administrations after removing ethnicity and SES. All possible comparison post hoc tests indicated no significant differences in sex within AIMSweb groups. These findings indicate that student sex is not necessarily an important factor in student performance on the test. Based on the current finding sex does not play a significant role in connecting the results of the AIMSweb screeners with those of the TAKS. This finding does not mean sex is not important, it simply was not significant to the AIMSweb in relationship to the TAKS. This finding aligns with those of Scafidi and Bui (2010) and Hyde et al. (2008), which indicated that 64 sex did not have an effect on test scores. Dupree and Morote (2011) also found that sex was not a significant factor on a standardized social studies test. Research Question 3 Research Question 3 was, “Do differences exist in TAKS mathematics scale scores based on AIMSweb proficiency levels and SES for each administration of AIMSweb mathematics universal screeners?” A Kruskal-Wallis test of differences was conducted to examine differences TAKS mathematics scale scores based on AIMSweb proficiency levels and SES for each administration of the AIMSweb mathematics universal screeners. Significant differences existed for SES on the AIMSweb mathematics for the fall, winter, and spring after removing sex and ethnicity. However, all possible comparisons post hoc tests indicated that no differences in SES existed on the AIMSweb. The findings for Research Question 3 indicate that knowing whether a student is or is not economically disadvantaged is of little importance to whether he or she will do well on the end-of-year TAKS mathematics test. This lack of significance seems to contradict previous studies on SES and academic achievement. For example, Spencer and Castano (2007) revealed that low-SES students do not perform as well as those students who are not considered low SES. Boaler (2003) indicated that students considered low SES often times feel labeled and say they do not perform as well on assessments because of the assumption that they will not do well. This study also revealed that students considered low SES may not understand the vocabulary or terminology used for standardized tests or be able to understand the examples given because they have not been exposed to those life experiences, which puts them at a disadvantage compared to students not considered low SES. 65 Research Question 4 Research Question 4 was, “Do differences exist in TAKS mathematics scale scores based on AIMSweb proficiency levels and ethnicity for each administration of AIMSweb mathematics universal screeners?” A Kruskal-Wallis test of differences was conducted to examine differences between TAKS mathematics scale scores and AIMSweb proficiency levels based on ethnicity. Significant differences existed for the fall, winter, and spring administrations after removing sex and SES. However, all possible comparisons post hoc tests indicated no significant differences by ethnicity within AIMSweb group. As with Research Questions 2 and 3, no differences existed by ethnicity within AIMSweb groups in the post hoc comparisons. This finding is of importance to educators because ethnicity is not an indicator as to whether a student will perform well on the end-of-year TAKS mathematics test. This finding also contradicts previous research. For example, Carrier et al. (2014) found that White students significantly outperformed African American and Hispanic students. They noted that language or cultural biases could play a role in the performance of different ethnic groups. Madaus and Clarke (2001) identified that high-stakes testing is an inequitable way to assess students of different races or cultures. It is important to note that in the current study, the number of White students was larger than any other ethnic group alone and larger than all other ethnic groups being combined. Implications The results of the study indicate significant differences between AIMSweb and TAKS after removing the effects of sex, SES, and ethnicity. Educators who administer the AIMSweb screeners can use results to assess mathematics skills quickly. Teachers could then provide students who do not do well with interventions to teach them the necessary skills for their grade 66 levels. Additionally, educators could use the results of the AIMSweb screeners to tailor instruction that assists students in mastering the skills needed to be successful on the TAKS. Often, critics of tests such as the TAKS argue that educators teach to the test. However, the AIMSweb screeners are based on national standards for each grade level; these screeners were not designed with the TAKS test in mind. Nationally, students should be taught and master the skills needed to be successful for their particular grade levels. The current findings indicate that students who perform well throughout the year will also perform well on standardized tests given at the end of the school year. Students who do not do well on the screeners need to be provided with interventions, which effective teachers do on any given day. It is also important that teachers not assume that differences will exist in test scores based on sex, ethnicity, or SES. Rather teachers should focus on the actual student and not on a particular group of students. Teachers may not agree with standardized testing; however, effective teachers do not let their personal feelings dictate how they deliver instruction in the classroom. If teachers use the standards set for their particular grade levels to teach the necessary curriculum, students should do well on the end-of-year standardized tests given that these tests assess the standards designed for that particular grade level. High-stakes testing is often criticized by the public for the pressure and stress it places on teachers, students, and administrators. With the passing of No Child Left Behind (NCLB), the heightened stress is unavoidable. It is important that educators focus on the curriculum and standards set for each grade level, enforce effective teaching within the classroom, and monitor student performance throughout the year. Instruction should not be delivered based on the test; rather, it should be delivered based on the standards. The TAKS is designed to test the standards for each grade level, which means that teachers need to know those standards prior to the 67 beginning of the school year. The world of high-stakes testing within schools will more than likely remain a focus for schools and classrooms for the next several years. Recommendations for Future Research Future research is needed to examine the benefits of using universal screeners to predict a student’s ability to perform on future standardized mathematics tests. Because screeners are designed to be quick assessments, it is unclear whether the questions asked truly align with standardized testing questions. Standardized tests generally have longer questions with multi-step problems, whereas screeners are designed to assess specific skills. Educators use the results of screeners to provide students who do not perform well with assistance. Students who perform below average or well-below average on screeners are typically provided interventions to improve their knowledge base for skills not mastered. Therefore, more research should be conducted to determine whether such interventions help students perform better on universal screeners and whether that performance translates to success on standardized tests. Texas changed the standardized state test from the TAKS to the State of Texas Assessment of Academic Readiness (STAAR). Therefore, this study should be replicated using STAAR results to verify the current findings. Academic rigor is reported to be higher with the STAAR test; therefore, results may vary considering the higher rigor and a replication study should reveal such differences. This study focused on results of third-grade students, and should be replicated using other grade levels. While basic mathematics skills are the same, standardized tests increase with difficulty as students progress through the grade levels. A replicated study using a higher grade level could yield different results. Additionally, the findings of this study did not reveal sex 68 differences, which could be a focus of future studies, especially in the field of mathematics. The lack of sex difference in this study indicates a discrepancy from previous studies; therefore, future research needed in this area. However, it should be noted that recent students have indicated that the sex gap in mathematics is closing. This study did not yield differences in SES or ethnicity, which also contradicts previous findings. Therefore, additional research is needed in this area. Because data collection for this study occurred during the first year the selected district used the AIMSweb, a longitudinal study could be conducted to track trends among sex, ethnicity, and SES to establish patterns in the data. Summary The findings revealed a significant difference between AIMSweb mathematics levels and TAKS is closing performance in third grade. However, no significant differences between AIMSweb and TAKS were found regarding sex, SES, or ethnicity when the effects of these variables were isolated. The difference between AIMSweb and TAKS is beneficial to classroom teachers and administrators. The findings of this study can help guide instruction and indicate which students need additional support to be more successful in mathematics. It is important and pertinent that teachers and administrators not focus on any particular student group, rather, look at individual students and their performance throughout the school year. 69 REFERENCES Allison, P. D. (1999). Multiple regression: A primer. Thousand Oaks, CA: Pine Forge. Berlinger, D. (2011). Rational responses to high stakes testing: The case of the curriculum narrowing and the harm that follows. Cambridge Journal of Education, 41(3), 287-302. doi:10.1080/0305764X.2011.607151 Boaler, J. (2003). When learning no longer matters: Standardized testing and the creation of inequality. Phi Delta Kappan, 84(7), 502-506. doi:/10.1177/003172170308400706 Carrier, S. J., Thomson, M. M., Tugurian, L. P., & Stevenson, K. T. (2014). Elementary science education in classrooms and outdoors: Stakeholders views, gender, ethnicity, and testing. International Journal of Science Education, 36(13), 2195-2220. doi:10.1080 /09500693.2014.917342 Clarke, B., Baker, S., Smolkowski, K., & Chard, D. J. (2008). An analysis of early numeracy curriculum-based measurement: Examining the role of growth in student outcomes. Remedial and Special Education, 29(1), 46-57. doi:10.1177/0741932507309694 Cohen, J. (1988). Statistical power analysis for the behavioral sciences (2nd ed.). Hillsdale, NJ: Lawrence Erlbaum. Cohen, J., Cohen, P., West, S. G., & Aiken, L. S. (2003). Applied multiple regression/correlation analysis for the behavioral sciences (3rd ed.). New York, NY: Routledge. Cunningham, W., & Sanzo, T. (2002). Is high-stakes testing harming lower socioeconomic status school? NASSP Bulletin, 86(631), 62-75. doi:10.1177/019263650208663106 Deno, S. L. (1985). Curriculum-based assessment: The emerging alternative. Exceptional Children, 52(3), 219-232. 70 Deno, S. L. (1986). Formative evaluation of individual programs: A new role for school psychologists. School Psychology Review, 15, 358-374. Deno, S. L., & Espin, C. (1991). Evaluating strategies for preventing and remediating basic skills deficits. In G. Stoner, M. Shinn, & H. Walker (Eds.), Interventions for achievement and behavior problems (pp. 79-97). Silver Spring, MD: National Association of School Psychologists. Dupree, J. J., & Morote, E. S. (2011). The connections between students self-motivation, their classification (typical learners, academic intervention services learners, and gifted), and gender in a standardized social studies test. US-China Education Review, B1, 150-154. Retrieved from ERIC database (ED524867). Education Testing Service. (2003). Linking classroom assessment with student learning |