Chapter 4 Missing values
4.1 Description
For the purpose of Problem Set 4, we will use the “avengers” data set in the “fivethirtyeight” package to analyze some missing values and their patterns. We have two reasons for choosing this data set:
- First, our datasets do not have any missing values, and you can check out here: https://data.pentaq.com/TeamStats?tour=82.
- Second, we are big fans of Marvel Studio so we are passionate to analyze information about avengers. By looking through descriptions of this data set, we can measure mortality rates of 173 current and former avengers.
4.2 Data Resource
Let’s first take a glance at the data set.
## [1] "url" "name_alias" "appearances"
## [4] "current" "gender" "probationary_intro"
## [7] "full_reserve_avengers_intro" "year" "years_since_joining"
## [10] "honorary" "death1" "return1"
## [13] "death2" "return2" "death3"
## [16] "return3" "death4" "return4"
## [19] "death5" "return5" "notes"
Here are explanations of the column variables:
- url: The URL of the comic character on the Marvel Wikia
- name_alias: The full name or alias of the character, which will be abbreviated as “nm_l”
- appearances:The number of comic books that character appeared in as of April 30, which will be abbreviated as “appr”
- current: Is the member currently active on an avengers affiliated team?, which will be abbreviated as “crrn”
- gender: The recorded gender of the character, which will be abbreviated as “gndr”
- probationary_intro: Sometimes the character was given probationary status as an Avenger, this is the date that happened, which will be abbreviated as “prob_”
- full_reserve_avengers_intro: The month and year the character was introduced as a full or reserve member of the Avengers, which will be abbreviated as "f___"
- year: The year the character was introduced as a full or reserve member of the Avengers
- years_since_joining: 2015 minus the year, which will be abbreviated as "yr__"
- honorary: The status of the avenger, if they were given “Honorary” Avenger status, if they are simply in the “Academy,” or “Full” otherwise, which will be abbreviated as “hnrr”
- death1: TRUE if the Avenger died, FALSE if not, which will be abbreviated as “dth1”
- return1: TRUE if the Avenger returned from their first death, FALSE if they did not, blank if not applicable, which will be abbreviated as “rtr1”
- death2: TRUE if the Avenger died a second time after their revival, FALSE if they did not, blank if not applicable, which will be abbreviated as “dth2”
- return2: TRUE if the Avenger returned from their second death, FALSE if they did not, blank if not applicable, which will be abbreviated as “rtr2”
- death3: TRUE if the Avenger died a third time after their second revival, FALSE if they did not, blank if not applicable, which will be abbreviated as “dth3”
- return3: TRUE if the Avenger returned from their third death, FALSE if they did not, blank if not applicable, which will be abbreviated as “rtr3”
- death4: TRUE if the Avenger died a fourth time after their third revival, FALSE if they did not, blank if not applicable, which will be abbreviated as “dth4”
- return4: TRUE if the Avenger returned from their fourth death, FALSE if they did not, blank if not applicable, which will be abbreviated as “rtr4”
- death5: TRUE if the Avenger died a fifth time after their fourth revival, FALSE if they did not, blank if not applicable, which will be abbreviated as “dth5”
- return5: TRUE if the Avenger returned from their fifth death, FALSE if they did not, blank if not applicable, which will be abbreviated as “rtr5”
- notes: Descriptions of deaths and resurrections
4.3 Missing Patterns
Let’s create some missing plots to see some partterns of the “avengers” data set.

Now we observed some missing patterns.
- From the top graph:
- We observed that every comic character of this data set has an url on the Marvel Wikia and appears somewhere once in comic books as of April 30(appr); most of them have full name or alias while some does not(nm_l).
- Additionally, they all joined the Avengers before (year & yr__) and their current status on any avengers affiliated teams are recorded(crnn).
- More precisely, most of them joined the Avengers directly, which means they were not given probationary status as an Avenger(prb_).
- However, there are some cases that we do not know the month the character was introduced as a full or reserve member of the Avengers(f___).
- Their gender and honorary status(gndr & hnrr) are also recorded in this data set.
- Moreover, this data set documented whether every one of them died once or not(dth1) for all comic characters and even recorded whether some comic characters returned from their first death(rtr1). For some comic characters died once, we know descriptions of their deaths and resurrections(nots).
- In addition, for most comic characters in this data set, we do not know whether they died more than once and whether they returned more than once. There are a few cases that we know whether some characters died after their first return from their first death and we know whether they returned from their second death. For all comic characters in this data set except Mar-Vell and Jocasta, we do not know whether they died more than twice or not and whether they returned more than twice or not. Here are their information except the “notes:” information.
## row1
## url http://marvel.wikia.com/Mar-Vell_(Earth-616)#
## name_alias Mar-Vell
## appearances 254
## current FALSE
## gender MALE
## probationary_intro NA
## full_reserve_avengers_intro Jul-78
## year 1978
## years_since_joining 37
## honorary Full
## death1 TRUE
## return1 TRUE
## death2 TRUE
## return2 TRUE
## death3 TRUE
## return3 FALSE
## death4 NA
## return4 NA
## death5 NA
## return5 NA
## row2
## url http://marvel.wikia.com/Jocasta_(Earth-616)#
## name_alias Jocasta
## appearances 141
## current TRUE
## gender FEMALE
## probationary_intro Jul-80
## full_reserve_avengers_intro Nov-88
## year 1988
## years_since_joining 27
## honorary Full
## death1 TRUE
## return1 TRUE
## death2 TRUE
## return2 TRUE
## death3 TRUE
## return3 TRUE
## death4 TRUE
## return4 TRUE
## death5 TRUE
## return5 TRUE
- From the bottom two graphs, we can further observe the missing patterns:
- For around 85 comic characters in this data set, we don’t know whether they returned from their first death or how they died for the first time. As a result, we don’t know the stories including deaths and returns after their first death. (pattern 1, 4, 6, 9, 14)
- For around 40 comic characters in this data set, we do know whether they returned from their first death and how they died for the first time, but we still don’t know the stories after their first return if they had. (pattern 2, 5, 8)
- For around 10 comic characters in this data set, we know whether they died second time or returned second time, but not after that. (pattern 3, 11, 15, 16)
- Like we observed before, only one comic character does not have any missing values. (pattern 10)
- For most comic characters in this data set, we don’t know their when they were given probationary status as an Avenger. (pattern 1, 2, 3, 4, 7, 8, 9, 15, 16)