Examining data

In this section, you’ll learn how to tidy and analyze results in R.

Collecting data

We’ll run the experiment to collect data that we can examine.

instructions

Make sure your project is still Unpublished
Delete any files that already exist in the experiment project page’s Results folder (make sure the dropdown list says UNpublished)
Run the experiment at least twice.
Save the results file as a CSV file named results.csv.

Reading in results

note

This section assumes prior knowledge of R.

We’ll use this sample results file, results.csv.

instructions

Add one of the following code blocks to your R script to:
1. Create a user-defined function that reads in a PennController results file in CSV format.
2. Read in results.csv and save it as a data frame named results.
Make sure that your R script and results.csv are in the same folder.

Click for base R version

# Set working directory to source file location

# User-defined function to read in PCIbex Farm results files
read.pcibex <- function(filepath, auto.colnames=TRUE, fun.col=function(col,cols){cols[cols==col]<-paste(col,"Ibex",sep=".");return(cols)}) {
  n.cols <- max(count.fields(filepath,sep=",",quote=NULL),na.rm=TRUE)
  if (auto.colnames){
    cols <- c()
    con <- file(filepath, "r")
    while ( TRUE ) {
      line <- readLines(con, n = 1, warn=FALSE)
      if ( length(line) == 0) {
        break
      }
      m <- regmatches(line,regexec("^# (\\d+)\\. (.+)\\.$",line))[[1]]
      if (length(m) == 3) {
        index <- as.numeric(m[2])
        value <- m[3]
        if (is.function(fun.col)){
         cols <- fun.col(value,cols)
        }
        cols[index] <- value
        if (index == n.cols){
          break
        }
      }
    }
    close(con)
    return(read.csv(filepath, comment.char="#", header=FALSE, col.names=cols))
  }
  else{
    return(read.csv(filepath, comment.char="#", header=FALSE, col.names=seq(1:n.cols)))
  }
}

# Read in results file
results <- read.pcibex("results.csv")

If you’re using the tidyverse, you may see an error message like the following when you create the results tibble:


Warning: 8 parsing failures.
row col   expected     actual          file
  1  -- 17 columns 13 columns 'results.csv'
  2  -- 17 columns 13 columns 'results.csv'
  3  -- 17 columns 13 columns 'results.csv'
  4  -- 17 columns 13 columns 'results.csv'
 21  -- 17 columns 13 columns 'results.csv'
... ... .......... .......... .............
See problems(...) for more details.

Don’t worry! You can ignore this message. The readr::read_csv() function throws a warning because some rows have different number of columns:

The rows that log the "consent" and "instructions" trials have the default 13 columns.
The rows that log the "experimental-trial" trials have the default 13 columns plus 4 columns added by the log method (group, item, condition and ID).

If you’re using base R, the pre-installed utils::read.csv() function won’t throw such a warning.

Tidying and analyzing data (optional)

note

This section uses the tidyverse to transform and analyze data (prior knowledge of the tidyverse assumed). The code blocks in this section are suggestions that can be modified as desired.

If you’re using base R, you can skip ahead to Wrapping up.

Tidyverse functions are designed to work with tidy data, meaning that:

Each variable must have its own column.
Each observation must have its own row.
Each value must have its own cell.

The results tibble is not tidy, because every "experimental-trial" trial is split into 4 rows:

Trial start
Information logged from the "side-by-side" Canvas
Information logged from the "selection" Selector
Trial end

Tidy the results tibble:

Click for more details

Add the following code block to your R script:

Keep only rows that log information about the "side-by-side" Canvas or "selection" Selector.
Keep only the ID, group, item, condition, PennElementName, Value, and EventTime columns.
Group by the ID and item variables.
Create the event and selection columns, and coerce the EventTime column from a character vector to a double vector.
Drop the PennElementName and Value columns (necessary for pivot_wider()).
“Widen” the tibble. For a more in-depth explanation, see Transforming data in R.
Save the tidied data as a new tibble named tidied_results.

tidied_results <- results %>% 
  filter(PennElementName == "side-by-side" | PennElementName == "selection") %>% 
  select(ID, group, item, condition, PennElementName, Value, EventTime) %>% 
  group_by(ID, item) %>% 
  mutate(event = case_when(PennElementName == "side-by-side" ~ "canvas_time",
                           PennElementName == "selection" ~ "selection_time"),
         selection = case_when("singular" %in% Value ~ "singular",
                               "plural" %in% Value ~ "plural",
                               FALSE ~ NA_character_),
         EventTime = if_else(EventTime == "Never", NA_real_, suppressWarnings(as.numeric(EventTime)))) %>% 
  ungroup() %>% 
  select(-PennElementName, -Value) %>% 
  pivot_wider(names_from = event, values_from = EventTime)

Note: You may need to scroll to the right to see all the columns.

ID	group	item	condition	selection	canvas_time	selection_time
SOME_ID	B	4	plural	plural	1603397451156	1603397453036
SOME_ID	B	2	plural	NA	1603397454060	NA
SOME_ID	B	3	singular	singular	1603397457722	1603397459321
SOME_ID	B	1	singular	singular	1603397460332	1603397461856
ANOTHER_ID	B	1	singular	singular	1603398704462	1603398706007
ANOTHER_ID	B	4	plural	plural	1603398707019	1603398708549
ANOTHER_ID	B	3	singular	plural	1603398709562	1603398711692
ANOTHER_ID	B	2	plural	plural	1603398712705	1603398714189

You can analyze the tidied data in a variety of ways, for example:

Calculate reaction times and response accuracy.
Calculate average reaction time by condition.
Calculate average response accuracy by participant.

Click for more details

Calculate reaction times and response accuracy:

Create the reaction_time column by subtracting the trial_start value from the canvas_time value. The resulting value is how long it took a participant to select an image once the images were printed to the screen.
Create the correct column by comparing the condition and selection columns. The resulting value is 1 if the particpant selected the correct image, and 0 if the participant selected the wrong image.

 tidied_results <- tidied_results %>% 
   mutate(reaction_time = selection_time - canvas_time,
         correct = if_else(condition == selection, 1, 0))

Result:

Note: You may need to scroll to the right to see all the columns.

ID	group	item	condition	selection	canvas_time	selection_time	reaction_time	correct
SOME_ID	B	4	plural	plural	1603397451156	1603397453036	1880	1
SOME_ID	B	2	plural	NA	1603397454060	NA	NA	NA
SOME_ID	B	3	singular	singular	1603397457722	1603397459321	1599	1
SOME_ID	B	1	singular	singular	1603397460332	1603397461856	1524	1
ANOTHER_ID	B	1	singular	singular	1603398704462	1603398706007	1545	1
ANOTHER_ID	B	4	plural	plural	1603398707019	1603398708549	1530	1
ANOTHER_ID	B	3	singular	plural	1603398709562	1603398711692	2130	0
ANOTHER_ID	B	2	plural	plural	1603398712705	1603398714189	1484	1

Calculate the average reaction time by condition:
```
 tidied_results %>% 
   group_by(condition) %>% 
   summarize(avg_rt = mean(reaction_time, na.rm = TRUE),
             n = sum(!is.na(reaction_time)))
```
Result:

condition avg_rt n

plural 1631. 3

singular 1700. 4
- n is the number of items with a reaction time for a given condition, meaning that 1 item in the plural condition did not have a response.
Calculate average response accuracy by participant:
```
 tidied_results %>% 
   group_by(ID) %>% 
   summarize(accuracy = sum(correct, na.rm = TRUE) / sum(!is.na(correct)),
             answered = sum(!is.na(correct)) / n())
```
Result:

ID accuracy answered

ANOTHER_ID 0.75 1

SOME_ID 1 0.75
- The ANOTHER_ID participant had 75% accuracy and 100% completeness, meaning that they responded correctly to 3 out of 4 items.
- The SOME_ID participant had 100% accuracy and 75% completeness, meaning that they responded correctly to 3 out of 3 items, and did not respond to 1 item.

condition	avg_rt	n
plural	1631.	3
singular	1700.	4

ID	accuracy	answered
ANOTHER_ID	0.75	1
SOME_ID	1	0.75

Collecting actual data

Once you have examined and successfully analyzed the data from your test runs, you will no longer edit your project. At this point (and not sooner) you are ready to publish your experiment:

instructions

Add the DebugOff command to turn off the debugger, since we’re now done writing the experiment script.
Click the Unpublished toggle in the Actions panel to change the experiment from unpublished to published.
Click Share and copy the link in the Data-collection link field.
Paste the experiment link into a new tab to take one final test-run to make sure publishing your experiment did not introduce new issues (it should not)
Click Results to open the data-collection results file and analyze it one more time

 
@// Type code below this line.
@
@// Remove command prefix
@PennController.ResetPrefix(null)
@
@// Turn off debugger
!DebugOff()
@
@// Control trial sequence
@Sequence("consent", "instructions", randomize("experimental-trial"), "completion_screen")
@
@// Instructions
@// code omitted in the interest of space