Chat with us, powered by LiveChat The setting of the assessment being covered (What kind of school, who is taking it?) The aspect of assessment the article is covering (is it the developme - Tutorie

The setting of the assessment being covered (What kind of school, who is taking it?) The aspect of assessment the article is covering (is it the developme

 

Be sure to include the following information 

  • The setting of the assessment being covered (What kind of school, who is taking it?)
  • The aspect of assessment the article is covering (is it the development of the assessment, the results, how students react to assessments, etc..)
  • What the author(s) hypothesis was, or what they wanted to know
  • The method the author(s) used
  • What the author(s) found during this research
  • How did this research add to the professional body of knowledge
  • Any other information you found interesting
  • Your own personal reflection on the article

28 Educational Research Quarterly December 2020

11(1), 71-85. Yılmaz, E. & Aslan, H. (2013). Examination of relationship between

teachers' loneliness at workplace and their life satisfaction. Pegem Journal of Education & Instruction, 3 (3), 59-69.

An Evaluation of Assessment Equity for Special Education Students in Texas

Mistie Dakroub

Randy Hendricks Craig Hammonds University of Mary Hardin-Baylor

The use of standardized assessments to monitor the success and productivity of public school systems in the United States is arguably a necessary component of state accountability systems. This research study is not intended to challenge the compulsion of accountability via testing, but the results of this study do raise questions regarding the equity of the assessment process required by the state of

Vol. 44.2 Educational Research Quarterly 29

Texas for its public school system. The demographics of the Texas public school student population includes a subgroup of special education (SPED) students who are currently charged with performing on state standardized tests at the same level as their general education counterparts. These SPED students previously participated in an assessment at a slightly lower rigor level than the cohort test, with fewer questions to complete. Texas policymakers removed this mid-level assessment, referred to as the STAAR Modified (STAAR M) test after 2014, and moved forward with one testing option for most SPED students. This option assesses not only SPED students, but also general education students (Texas Classroom Teachers Association, 2013). This study calls into question the need for a tiered testing system for SPED students who function in the mainstream population but require significant instructional accommodations or curricular modification. This quantitative study employs the use of a readability analysis tool to assess the 2014 STAAR and the 2014 STAAR M to determine assessment readability levels for grades three through eight in the subjects of reading, social studies, and science. In addition, an analysis of passing percentages was conducted for SPED and non-SPED students tested in the same grade levels and subjects for 2012-2015; 2015 represents the first year that STAAR M assessments were not available to the SPED population. The data from the readability and passing percentage analyses were used to evaluate the degree of assessment equity for the Texas SPED student population after the elimination of the STAAR Modified assessment option.

The Special Education Instruction Assessment Divide in Texas

In 2012, Texas policymakers significantly increased the rigor of the state’s public school accountability system by implementing the State of Texas Assessment of Academic Readiness (STAAR) testing program. This new testing program had three major testing options for students: The STAAR Alternate (STAAR Alt) test, which was intended for special education (SPED) students with more severe and profound disabilities; the STAAR Modified (STAAR M) test, which was for SPED students with significant curricular modifications or instructional accommodations in the mainstream

30 Educational Research Quarterly December 2020

classroom setting; and the STAAR test, which was the mainstream assessment for non-SPED Texas students. In 2015, Texas policymakers removed the STAAR M option for SPED students (Texas Education Agency, n.d.a). Since the current testing program limits the STAAR Alt to the most severely disabled students, this change potentially widens the assessment achievement gap between the majority of SPED students and their mainstream peers. The SPED students most adversely affected by the restructuring are those capable of attending mainstream classes but not skilled enough to function at the same academic level as their non-disabled cohort.

Special education teachers and campus administrators labor under the expectation that students in special education programs receive student-centered instruction tailored to each student’s individual needs. Educators must follow the individual education plan (IEP) for each student, but this does not necessarily align with state assessment expectations. State testing based solely on the cohort grade level does not take into account the specific needs of the special education student nor his or her IEP. The accommodations that a special education student can receive on the STAAR test include adjustments such as extended time, small group testing environment, and, in some cases, having the assessment read aloud. Some students also qualify for vocabulary assistance, but they must take the online version of the STAAR in order to access that particular accommodation (Texas Education Agency, n.d.b). All of these systems assist special education students, but they still leave a potential gap between the level of instruction and the level of assessment. Students who achieve below grade level by two or three grades must be instructed at that level and remediated up to the cohort grade level. However, upward remediation typically does not happen in one year, and for some students it never happens. The assessment of all students should mirror the level of instruction they have received during the academic school year. However, currently in Texas, the vast majority of special education students are held to

Vol. 44.2 Educational Research Quarterly 31

an equivalent standard on the mandated STAAR assessments as their non-disabled peers with only minor accommodations or assistance (Chomsky & Robichaud, 2014; Dodge, 2009), resulting in instruction- assessment misalignment for many SPED students. Given this instruction-assessment misalignment, the instructional expectation for many special education students may not reach the rigor of state- mandated assessments.

As the expectation in the United States to educate all students has grown, so has the need for best practices in educating students with diverse learning needs (Katsiyannis, Yell, & Bradley, 2001). Identified special education students have been exposed to a wide variety of educational environments since the 1975 inception of the Education of all Handicapped Children’s Act (EAHCA), which was the first federal law requiring special education services. Laws supporting the education of SPED students have progressed to keep public schools from isolating SPED students from the mainstream population; however, the Texas accountability system in the late 1990s and turn of the 21st century excluded scores of special education students from the school rating system. Thus, the expectations for special education students on state assessments progressed from no to little accountability in the state’s earlier testing program to the current system based on results of either the STAAR or STAAR Alt (Duckworth, Tsukayama, & Quinn, 2012; LaVenture, 2003).

The present system allows school districts to declare a small percentage of students eligible for the STAAR Alt assessment, which is intended for special education students with severe and profound disabilities. School districts who over-identify special education students in the STAAR Alt assessment category receive automatic failures for the number of students over the allotted percentage of expected testers. Currently, each school district is limited to 1% of the student population taking the STAAR Alt test, which means that the STAAR Alt testing population exceeding 1% are counted as

32 Educational Research Quarterly December 2020

automatic failures in the accountability system. This practice forces school district administrators to push students out of STAAR Alt and into the standard STAAR assessment category, even with full knowledge that for some special education students the IEP required is not aligned with the academic rigor of the standard STAAR test (Greer & Meyen, 2009; Texas Education Agency, n.d.c). As mandated by federal law, the state of Texas has required the standardized assessment of all students, including SPED students, since No Child Left Behind (Cho & Kingston, 2011). Currently, Texas testing, ostensibly established based on the vocabulary level of the mainstream students in each grade, does not take into account the wide-ranging levels of SPED students. The researchers of this study assumed that readability analysis, using a calibration system that takes into account vocabulary and sentence structure to determine the grade level of a written piece, would be an essential part of test item selection to ensure that SPED students do not struggle to comprehend questions and passages that are above the tested grade reading level. In order to confirm that assumption, two separate emails were directed to the Texas Education Agency. One email specifically asked about the calibration tools used to determine the readability level of STAAR tests; the second email involved a similar inquiry. The question was posed to two different departments in an effort to ensure that the response was as accurate as possible. The email responses both housed the same answer. STAAR tests are not calibrated to determine reading level by an official calibration tool. Given that SPED students are often two and three grades below their peers in reading level, the need to ensure that SPED students are assessed equitably in regards to reading level seems pedagogically sound (Gillies, 2014; Welch, 1998).

The purpose of this study was to assess the impact that the elimination of the STAAR M test had on the SPED student testing population in Texas, specifically the year following the last administration of the modified test. Students receiving special

Vol. 44.2 Educational Research Quarterly 33

education services have federal guidelines that protect the educational environment that schools must provide for them. Students who perform below grade level on diagnostic testing and qualify for special education services have a right to instruction at their developmental level. Special education teachers must then work with that student to bring him or her up to grade level, or as close to grade level as feasible in the academic year. However, mainstream special education students in Texas are assessed with the cohort group that matches the grade level as opposed to the academic ability of individual students. The lumping of special education students with mainstream students for testing purposes creates a chasm between the instructional experiences and assessment practices for special education students in the state of Texas (Bock & Erickson, 2015; Gillies, 2014). Given the high stakes associated with the Texas testing system, the need for adequate and appropriate assessment for special education students is a real and valid area for research (Gillies, 2014; Roach, Beddow, Kurz, Kettler, & Elliott, 2010).

Methodology and Findings

All data for this study were collected from the Texas Education Agency (TEA) and analyzed using SPSS statistical software. This causal-comparative, quantitative study was designed to address four research questions related to the readability levels of the 2014 modified and non-modified STAAR tests, as well as the passing rates disaggregated by the 20 educational regions across Texas over the span of 2012 through 2015 for SPED and non-SPED students taking the STAAR and STAAR M assessments. More specially, the following research questions served to guide the study design and data analysis.

R1: Is there a statistically significant difference between readability levels of the passages/questions from the 2014 STAAR M assessments in grades three through eight in reading, science, and social studies compared with the readability analysis levels of the

34 Educational Research Quarterly December 2020

passages/questions from the 2014 standard STAAR assessments in grades three through eight in reading, science, and social studies?

R2: Is there a statistically significant difference between passing rates for special education students who took the modified 3rd through 8th grade reading STAAR tests across the years 2012, 2013, 2014 and SPED students who took the standard STAAR test in 2015 compared to the passing rates of non-SPED students who took the standard 3rd through 8th grade reading STAAR test across the years 2012, 2013, 2014 and 2015?

R3: Is there a statistically significant difference between passing rates for special education students who took the modified 5th and 8th grade science STAAR test across the years 2012, 2013, 2014 and SPED students who took the standard STAAR test in 2015 compared to the passing rates of non-SPED students who took the standard 5th and 8th grade science STAAR tests across the years 2012, 2013, 2014 and 2015?

R4: Is there a statistically significant difference between passing rates for special education students who took the modified 8th grade social studies STAAR test across the years 2012, 2013, 2014 and SPED students who took the standard STAAR test in 2015 compared to the passing rates of non-SPED students who took the standard 8th grade social studies STAAR test across the years 2012, 2013, 2014 and 2015?

Reading level analysis of the questions and passages of the 2014 regular and modified STAAR tests was conducted to determine the degree that each was aligned with the designated grade levels for reading, science, and social studies assessments in grades three through eight. The 2014 STAAR assessment was chosen for the readability analysis since it represents the last year the modified tests were administered. If the non-modified and modified versions of the STAAR assessments significantly differ in regards to readability level, an argument can be made that any decline in SPED passing rates after the elimination of the modified tests are likely the result of a

Vol. 44.2 Educational Research Quarterly 35

misalignment between SPED students’ readability levels and the readability levels of the standard STAAR assessments as opposed to instructional deficiencies. The researchers used the calibration system ReadablePro to determine the readability score for each question or passage assessed. Each readability passage or question analysis resulted in five separate indices scores, which comprise the ReadablePro calibration algorithm. The five indices scores were then averaged to determine an overall readability score. Individual indices included the Flesch-Kincaid, Gunning’s Fog, Coleman-Liau, Simple Measure of Gobbledygook (SMOG), and the Automated Readability Index (ARI) (Readability Formulas, n.d.; ReadablePro Features, n.d.). As an example, Table 1 provides a breakdown of the results for the non-modified STAAR 3rd grade reading and the modified STAAR 3rd grade reading tests from 2014. The overall readability score, expressed as a grade level, is 7.0 and 4.1 for the non-modified and modified tests respectively. In this case, the non-modified test was found to be 4.0 grade levels above the designated grade level while the modified test was found to be 1.1 grade levels above.

Table 1. Example of Readability Score Determination for 3rd Grade Reading Testing Format

Flesch- Kincaid

Gunning’s Fog

Coleman- Liau

SMOG ARI Readability Score

Non-odified 5.9 7.6 7.3 8.9 5.3 7.0 Modified 2.7 4.4 4.8 6.3 2.1 4.1

One-way ANOVA procedures were conducted to address research questions two through four. The dependent variables for the ANOVA analyses were the passing rates for the student cohorts taking the different STAAR assessments, with the unit of analysis being the 20 education regions within the state of Texas. The independent variables included the combination of student classification (SPED or non-SPED) and the year of the test (2012 through 2015), with resulted in eight student cohort comparisons per

36 Educational Research Quarterly December 2020

analysis: four SPED cohorts (2012-2015) and four non-SPED cohorts (2012-2015). The ANOVA analyses were designed to determine if passing rates for SPED students mirror the passing rates for non-SPED students over time. If the modified version of STAAR is unnecessary for appropriate SPED assessments, then the passing rates for SPED students should follow the same pattern as the passing rates for non-SPED students over the same years of assessments. If, however, the elimination of the modified version of STAAR resulted in an instruction-assessment misalignment, the SPED passing rates after the elimination of modified assessments would be expected to show a marked departure from the previous trend (Office of Qualifications and Examinations Regulation, 2016).

In conducting the ANOVA procedures, the researchers first reviewed the results of Levene’s test to verify the assumption of homogeneity of variances. When Levene’s test was significant (p ≤ 0.05), indicating a lack of homogeneity of variances, the researchers used the Welch’s F statistic as the test for an overall significant difference between means and the Games-Howell post hoc test to identify the specific group means that differed significantly (Northern Arizona University, n.d.). When Levene’s test was not significant, the researchers used the traditional ANOVA F statistic and Tukey’s HSD as the overall significance test and post hoc test respectively. The researchers used an alpha level of .05 for all analyses. Partial eta

squared (ɳp 2) was calculated as the measure of effect size for all

significant F values; partial eta squared values of .01, .06. and .14 correspond to small, medium, and large effect sizes respectively (Stern, 2011). In addition, when a significant pairwise difference was found between the 2015 SPED student group and any of the other student groups, effect size was determined by calculating Hedges’ g. Hedges’ g values of 0.2, 0.5, and 0.8 demonstrate small, medium, and large pairwise differences respectively (Lakens, 2013).

Readability Analysis

Vol. 44.2 Educational Research Quarterly 37

A paired sample t test was conducted to analyze the readability of the 2014 STAAR M and 2014 STAAR assessments. Thirty-three data points were collected from both test versions in grades three through eight in the subjects of reading, social studies, and science. The analysis revealed a significant difference in the readability for the non-modified (M = 1.86, SD = 2.88) and the modified assessments (M = .076, SD = 2.02); t(32) = -3.10, p < .004). This finding is consistent with the need for a lower readability level for the modified test but also indicates that both assessments are typically above the assigned grade level, with the non-modified test being 1.86 grade levels above and the modified test at 0.76 grade levels above. Table 2. Results of Paired-Samples t Test Analysis for Modified and Non-

Modified Readability Levels

M

n

SD

t

df p

Non-modified 1.86 33 2.875 3.100 32 .004 Modified .076 33 2.022

NOTE: The mean (M) represents the average deviation from assigned grade levels.

Passing Rate Analyses

One-way ANOVA procedures were conducted to assess research questions two through four. In order to facilitate the reporting of results, the findings are grouped based on subject: (a) 3rd through 8th grade reading and b) 5th grade science, 8th grade science, and 8th grade social studies. A line graph is provided for each analysis following the explanation to illustrate the results; individual ANOVA tables are included in the appendix.

Reading Grades Three through Eight. As delineated in Table 3, the results of the one-way ANOVA

analyses for the reading assessments were significant in all six cases, with large partial eta squared values for each. In addition, the post-

38 Educational Research Quarterly December 2020

hoc procedures indicated that the passing rate for the 2015 SPED group was significantly less than the other seven student groups in all six analyses. The Hedge’s g values for the 2015 SPED pairwise comparisons to the other student groups ranged from 2.30 to 6.44, which represents a large effect in all instances. As shown in the line graphs depicted in Figure 1, the elimination of the modified test in 2015 resulted in a precipitous decline for the 2015 SPED cohort on the reading STAAR tests in grades three through eight. Table 3. One-way ANOVA Results for Reading Grades Three through Eight Grade Levene’s F p ɳp

2 Post-Hoc 2015 SPED Pairwise Comparisons

Hedge’s g Range

3rd

p = .134

68.450

< .001

.76

All significant (p < .001)

3.29 – 5.16

4th p = .178 77.417 < .001 .78 All significant (p < .001) 3.84 – 6.39 5th p < .001 72.094 < .001 .81 All significant (p < .001) 2.24 – 4.43 6th p = .003 66.162 < .001 .85 All significant (p < .001) 4.14 – 6.33 7th p = .011 75.218 < .001 .85 All significant (p < .001) 4.19 – 6.44 8th p < .001 116.70 < .001 .85 All significant (p < .001) 2.30 – 6.17

Figure 1. Passing rate comparisons for SPED and non-SPED students for reading grades three through eight.

Science Grades Five and Eight and Social Studies Grade Eight.

As shown in Table 4, the one-way ANOVA procedures for the two science and one social studies analyses resulted in significant differences and large effect sizes for all three. In addition, the post hoc procedures indicated significantly lower passing rates for the 2015 SPED group in all comparisons to other student groups. Hedge’s g pairwise effect size values ranged from 2.15 to 8.83, which exceeds the .8 large effect threshold in every case. The findings for the two science and one social studies assessments demonstrate consistency with the reading assessments: The elimination of the modified test after 2014 resulted in a precipitous decline in passing rates for the SPED testing population in 2015 on the two science assessments in grades five and eight and the one social studies assessment in grade eight (Figure 2).

Table 4. One-way ANOVA Results for Science Grades Fifth and Eight and Social Studies Grade Eight

Assessment Levene’s F p ɳp

2 Post-Hoc 2015 SPED Pairwise Comparisons

Hedge’s g Range

5th Science p = .145 89.944 < .001 .81 All significant (p < .001) 2.15 – 5.09 8th Science p = .124 130.926 < .001 .86 All significant (p < .001) 6.08 – 8.03 8th SS p = .833 77.341 < .001 .78 All significant (p < .001) 5.15 – 8.83

Figure 2. Passing rate comparisons for SPED and non-SPED students for reading grades three through eight.

20

30

40

50

60

70

2012 2013 2014 2015

8th Grade Social Studies

SPED NSPED

Conclusions and Recommendations This study was designed to assess the suitability of the

standard STAAR test as a pedagogically appropriate assessment for Texas SPED students. State policymakers determined that the tiered testing system of STAAR, STAAR M, and STAAR Alt would be minimized to only offering the STAAR and STAAR Alt tests beginning in the year 2015 (Texas Classroom Teachers Association, 2013). SPED students who had previously taken the STAAR M, which had fewer questions, fewer answer choices, and often was at a lower reading level, would now take the STAAR test with their peers. This decision was based on federal mandates, but no significant changes were made to the STAAR test in anticipation of adding the SPED students to the testing cohort. The problem with this system is that the expectation and structure of the STAAR test do not match the instructional expectation for SPED students. SPED students are entitled to instructional accommodations and curricular modifications as a means to bridge the students’ learning gaps over time, commonly over the span of multiple academic years (Fuchs & Fuchs, 2016). The standard STAAR test fails to take this factor into account.

The data resulting from this study show a plummet in scores as SPED students who previously achieved closer to their cohort counterparts at each grade level and subject area took a distinct dip in performance in 2015, indicating that the STAAR M assessment was a more appropriate assessment option for students with disabilities. The findings on the readability analysis of the STAAR and STAAR M assessments in 2014, the last year the STAAR M was administered, show that the readability level for the STAAR and STAAR M were both above grade level in readability. The STAAR M readability was slightly above grade level, instead of rating slightly below, as would be expected for students with SPED accommodations and modifications for learning disabilities (Hart, 2015).

Upon completing this study, the researchers recommend that Texas policymakers consider reinstating the three-tiered assessment format that takes into account students who qualify for SPED but are not severe and profound enough in their disabilities to qualify for the STAAR Alt assessment. This would also require federal bureaucrats at the U.S. Department of Education to acknowledge the legitimacy of a middle tier

Vol. 44.2 Educational Research Quarterly 43

assessment for SPED students in the mainstream classroom setting. If a return to the three-tiered assessment format is deemed unacceptable, policymakers should ensure that the standard STAAR assessments are at least closer to actual grade level readability as determined by a legitimate calibration process. The need for accountability in Texas public schools was never a question in this research study, but the need for greater assessment equity for special education students is indicated based on the study findings; fundamental fairness would seem to demand it.

References Bock, A., & Erickson, K. (2015). The influence of teacher

epistemology and practice on student engagement in literacy learning. Research & Practice for Persons with Severe Disabilities, 40(2), 138–153. https://doi.org/10.1177/1540796915591987

Cho, H.-J., & Kingston, N. (2011). Capturing implicit policy from NCLB test type assignments of students with disabilities. Exceptional Children, 78(1), 58–72. http://0-search. ebscohost.com.umhblib.umhb.edu/login.aspx?direct=true&d b=eue&AN=508487883&site=ehost-live

Chomsky, N., & Robichaud, A. (2014). Standardized testing as an assault on humanism and critical thinking in education. Radical Pedagogy, 11(1), 3–3. http://0-search.ebscohost. com.umhblib.umhb.edu/login.aspx?direct=true&db=eue&A N=94334249&site=ehost-live

Dodge, A. (2009). Heuristics and NCLB standardized tests: A convenient lie. International Journal of Progressive Education, 5(2), 6–22. http://0-search.ebscohost.com.umhblib.umhb.edu/ login.aspx?direct=true&db=eue&AN=51837750&site=ehost -live

Duckworth, A. L., Tsukayama, E., & Quinn, P. D. (2012). What No Child Left Behind leaves behind: The roles of IQ and self- control in predicting standardized achievement test scores and report card grades. Journal of Educational Psychology, 104(2), 439–451. https://doi.org/10.1037/a0026280

44 Educational Research Quarterly December 2020

Fuchs, D., & Fuchs, L. S. (2016). Responsiveness-to-intervention: a “systems” approach to instructional adaptation. Theory Into Practice, 55(3), 225–233. https://doi.org/10.1080/ 00405841.2016.1184536

Gillies, R. M. (2014). The role of assessment in informing interventions for students with special education needs. International Journal of Disability, Development & Education, 61(1), 1–5. https://doi.org/10.1080/1034912X.2014.878528

Greer, D. L., & Meyen, E. L. (2009). Special edu

Are you struggling with this assignment?

Our team of qualified writers will write an original paper for you. Good grades guaranteed! Complete paper delivered straight to your email.

Place Order Now