1 00:00:05,720 --> 00:02:02,960 >>WELCOME, I'M JEFF REZNICK 2 00:02:02,960 --> 00:02:05,640 IT IS MY PRIVILEGE TO BE PART OF 3 00:02:05,640 --> 00:02:07,520 THIS PROGRAM AND TO INTRODUCE 4 00:02:07,520 --> 00:02:10,280 TODAY'S SPEAKER. 5 00:02:10,280 --> 00:02:11,640 DR. MICHELE WEIGLE IS A 6 00:02:11,640 --> 00:02:14,160 PROFESSOR IN THE DEPARTMENT OF 7 00:02:14,160 --> 00:02:17,320 COMPUTER SCIENCE IN OLD DOMINION 8 00:02:17,320 --> 00:02:24,920 UNIVERSITY. 9 00:02:24,920 --> 00:02:27,760 DR. WIGGLING IS PUB REQUIRED 10 00:02:27,760 --> 00:02:30,920 OVER 150 ARTICLES AND SHE HAS 11 00:02:30,920 --> 00:02:37,600 SERVED AS PI OR CO-PI ON GRANTS. 12 00:02:37,600 --> 00:02:39,720 INCLUDING THE NATIONAL SCIENCE 13 00:02:39,720 --> 00:02:41,040 FOUNDATION, THE NATIONAL ENDOW 14 00:02:41,040 --> 00:02:41,200 ME. 15 00:02:41,200 --> 00:02:43,280 FOR THE HUMANITIES, THE 16 00:02:43,280 --> 00:02:46,160 INSTITUTE OF MUSEUM AND LIBRARY 17 00:02:46,160 --> 00:02:49,040 SERVICES AND THE MELON 18 00:02:49,040 --> 00:02:49,360 FOUNDATION. 19 00:02:49,360 --> 00:02:51,200 SHE CURRENTLY SERVES ON THE 20 00:02:51,200 --> 00:02:53,640 EDITORIAL BOARDS OF THE JOURNAL 21 00:02:53,640 --> 00:02:56,200 OF ASSOCIATION FOR INFORMATION 22 00:02:56,200 --> 00:02:59,000 AND SCIENCE TECHNOLOGY. 23 00:02:59,000 --> 00:03:00,720 I'M THRILLED THAT SHE CAN JOIN 24 00:03:00,720 --> 00:03:07,560 US TODAY TO SPEAK ON WHAT I IS N 25 00:03:07,560 --> 00:03:09,760 WEB ARCHIVE COLLECTION. 26 00:03:09,760 --> 00:03:11,640 SUMMATION AND DISCOVERY OF 27 00:03:11,640 --> 00:03:13,000 ARCHIVED WEB PAGES. 28 00:03:13,000 --> 00:03:13,520 WELCOME. 29 00:03:13,520 --> 00:03:16,200 >>THANK YOU, CHRISTIE. 30 00:03:16,200 --> 00:03:18,520 I'M HONORED TO GIVE A TALK IN 31 00:03:18,520 --> 00:03:19,320 THIS GREAT SERIES. 32 00:03:19,320 --> 00:03:21,120 BEFORE I GET INTO THE MAIN TOPIC 33 00:03:21,120 --> 00:03:23,560 I WANT TO GIVE A BIT OF CONTEXT 34 00:03:23,560 --> 00:03:24,800 ABOUT OUR RESEARCH GROUP AND 35 00:03:24,800 --> 00:03:26,360 WHAT WE STUDY. 36 00:03:26,360 --> 00:03:28,080 WE'RE COMPUTER SCIENTISTS AND 37 00:03:28,080 --> 00:03:30,680 INSTEAD OF USING THE CONTENT OF 38 00:03:30,680 --> 00:03:32,920 THE WEB OR WEB ARCHIVES FOR 39 00:03:32,920 --> 00:03:36,040 RESEARCH OUR MAIN FOCUS IS IN 40 00:03:36,040 --> 00:03:37,200 STUDYING THE REPRESENTATION OF 41 00:03:37,200 --> 00:03:38,680 THE CONTENT ON THE WEB AND HOW 42 00:03:38,680 --> 00:03:40,920 WELL, THE WEB AS IT CHANGES OVER 43 00:03:40,920 --> 00:03:44,440 TIME IS AND CAN BE ARCHIVED. 44 00:03:44,440 --> 00:03:45,880 AND WE BUILT PROOF OF CONCEPT 45 00:03:45,880 --> 00:03:48,240 TOOLS TO HELP FACILITATE THE USE 46 00:03:48,240 --> 00:03:50,800 OF WEB ARCHIVES BY RESEARCHERS 47 00:03:50,800 --> 00:03:53,000 IN OTHER FIELDS. 48 00:03:53,000 --> 00:03:54,840 TODAY I WILL BE GIVING AN 49 00:03:54,840 --> 00:03:56,640 INTRODUCTION AND WHY IT'S SO 50 00:03:56,640 --> 00:03:58,040 IMPORTANT AND WHY SOMETIMES IT'S 51 00:03:58,040 --> 00:03:58,800 SO HARD. 52 00:03:58,800 --> 00:04:01,360 AND I WILL BE HIGHLIGHTING WORK 53 00:04:01,360 --> 00:04:03,600 THAT OUR GROUP HAS DONE TO STUDY 54 00:04:03,600 --> 00:04:05,280 AND ADDRESS SOME OF THESE 55 00:04:05,280 --> 00:04:06,160 CHALLENGES. 56 00:04:06,160 --> 00:04:08,080 HOW WE CAN MAKE WEB ARCHIVES 57 00:04:08,080 --> 00:04:09,840 MORE ACCESSIBLE TO RESEARCHERS 58 00:04:09,840 --> 00:04:11,680 WHO WANT TO STUDY THE CONTENT 59 00:04:11,680 --> 00:04:14,160 AND HOW TO MAKE THAT CONTENT 60 00:04:14,160 --> 00:04:15,960 MORE EASILY DISCOVERABLE. 61 00:04:15,960 --> 00:04:18,000 THESE SLIDES CONTAIN LOTS OF 62 00:04:18,000 --> 00:04:20,400 REFERENCES AND LINKS SO, I'VE 63 00:04:20,400 --> 00:04:21,240 MADE THEM AVAILABLE FOR THOSE 64 00:04:21,240 --> 00:04:23,240 WHO ARE INTERESTED IN FINDING 65 00:04:23,240 --> 00:04:24,680 MORE INFORMATION. 66 00:04:24,680 --> 00:04:27,240 THE SHORT LINK IS ON THE SLIDE 67 00:04:27,240 --> 00:04:30,120 HERE AND WILL APPEAR AGAIN ON MY 68 00:04:30,120 --> 00:04:37,360 LAST SLIDE. 69 00:04:37,360 --> 00:04:43,080 SO OUR WEBSITE AND DIGITAL 70 00:04:43,080 --> 00:04:49,120 LIBRARIES WS-DL AS WE CALL IT -- 71 00:04:49,120 --> 00:04:50,520 AND THE PARTICULAR WORKS AND 72 00:04:50,520 --> 00:04:52,720 PROJECTS THAT I'LL MENTION TODAY 73 00:04:52,720 --> 00:04:56,200 WERE DONE IN COLLABORATION WITH 74 00:04:56,200 --> 00:04:58,240 MY COLLEAGUE MICHAEL NELSON AND 75 00:04:58,240 --> 00:05:00,680 THE STUDENTS MANY OF WHOM ARE 76 00:05:00,680 --> 00:05:08,520 NOW ALUMNI WHO ARE PI -- ARE PI. 77 00:05:08,520 --> 00:05:10,280 WE WOULD NOT BE ABLE TO DO ANY 78 00:05:10,280 --> 00:05:12,680 OF THIS WORK WITHOUT OUR 79 00:05:12,680 --> 00:05:13,120 GRADUATE STUDENTS. 80 00:05:13,120 --> 00:05:14,920 AND I'D ALSO LIKE TO ACKNOWLEDGE 81 00:05:14,920 --> 00:05:17,520 AND THANK THE AGENCIES THAT HAVE 82 00:05:17,520 --> 00:05:27,080 FUNDED MUCH OF THIS WORK. 83 00:05:27,080 --> 00:05:29,840 SO I THINK IT'S APPARENT TO 84 00:05:29,840 --> 00:05:31,280 EVERYONE NOW THAT THE WEB IS 85 00:05:31,280 --> 00:05:33,280 ESSENTIAL TO OUR DAILY LIVES. 86 00:05:33,280 --> 00:05:35,960 WHERE MANY US GO TO FIND OUT 87 00:05:35,960 --> 00:05:39,360 WHAT IS HAPPENING IN THE WORLD. 88 00:05:39,360 --> 00:05:41,720 AND LOOKING AT WEB FROM THE 89 00:05:41,720 --> 00:05:44,360 VANTAGE IMPORTANT OF SPRING 2020 90 00:05:44,360 --> 00:05:45,800 CAN HELP US REMEMBER WHAT IT 91 00:05:45,800 --> 00:05:47,240 FELT LIKE IN THE EARLY DAYS OF 92 00:05:47,240 --> 00:05:48,240 THE PANDEMIC. 93 00:05:48,240 --> 00:05:49,960 SO MUCH HAS HAPPENED IN THE TWO 94 00:05:49,960 --> 00:05:52,080 AND A HALF YEARS SINCE THEN. 95 00:05:52,080 --> 00:05:53,640 AND SOMETIMES HARD TO REMEMBER 96 00:05:53,640 --> 00:05:56,840 SOME OF THE FEELINGS OF ALARM, 97 00:05:56,840 --> 00:05:59,200 ANXIETY, UNCERTAINTY AND EVEN 98 00:05:59,200 --> 00:06:01,200 LOWLINESS AS WE HAD TO CURTAIL 99 00:06:01,200 --> 00:06:01,960 OUR ACTIVITIES. 100 00:06:01,960 --> 00:06:04,280 FOR ME AS A COLLEGE BASKETBALL 101 00:06:04,280 --> 00:06:06,600 FAN THINGS REALLY HIT HOME. 102 00:06:06,600 --> 00:06:10,680 THEY GOT REAL WHEN THE NCAA 103 00:06:10,680 --> 00:06:15,320 CANCELED THE MARCH MADNESS 104 00:06:15,320 --> 00:06:18,320 BASKETBALL TOURNAMENT. 105 00:06:18,320 --> 00:06:20,800 AND AS YOU CAN SEE IN THE SCREEN 106 00:06:20,800 --> 00:06:23,680 SHOT FROM MARCH 13, 2020 THE 107 00:06:23,680 --> 00:06:27,760 BANNER AT ESPN.com HAS THIS 108 00:06:27,760 --> 00:06:29,560 LONG LIST OF BASKETBALL GAMES 109 00:06:29,560 --> 00:06:32,200 THAT HAD BEEN CANCELED. 110 00:06:32,200 --> 00:06:36,240 AND THERE IS A SCREEN SHOT FROM 111 00:06:36,240 --> 00:06:38,520 CNN .COM WITH A PICTURE FROM A 112 00:06:38,520 --> 00:06:40,760 GROCERY STORE AND NOTHING QUITE 113 00:06:40,760 --> 00:06:42,520 CONVEYS GENERAL PUBLIC PANIC 114 00:06:42,520 --> 00:06:45,360 LIKE LONG LINES AND FULL GROCERY 115 00:06:45,360 --> 00:06:47,640 CARTS. 116 00:06:47,640 --> 00:06:51,080 BUT WEB PAGES THAT EVOKE THESE 117 00:06:51,080 --> 00:06:52,240 FEELINGS CAN CHANGE QUICKLY AND 118 00:06:52,240 --> 00:06:53,800 THIS IS ESPECIALLY TRUE FOR WEB 119 00:06:53,800 --> 00:06:55,920 PAGES THAT ARE BUILT TO STAY 120 00:06:55,920 --> 00:06:59,440 CURRENT LIKE THE MAIN PAGES OF 121 00:06:59,440 --> 00:07:02,760 ESPN.com AND CNN.com. 122 00:07:02,760 --> 00:07:07,200 SO THE REASON THAT WE NOW IN 123 00:07:07,200 --> 00:07:09,080 LATE 2022 HAVE ACCESS TO THESE 124 00:07:09,080 --> 00:07:11,440 WEB PAGES FROM SPRING 2020 IS 125 00:07:11,440 --> 00:07:13,040 BECAUSE OF WEB ARCHIVES AND THE 126 00:07:13,040 --> 00:07:14,840 IMPORTANCE OF CAPTURING FULL 127 00:07:14,840 --> 00:07:17,040 CONTENT OF WEB PAGES INSTEAD OF 128 00:07:17,040 --> 00:07:20,160 JUST CAPTURING SCREEN SHOTS IS 129 00:07:20,160 --> 00:07:22,880 WE CAN GO BACK AND INTERACT. 130 00:07:22,880 --> 00:07:25,120 WE CAN CLICK LINKS, SCROLL 131 00:07:25,120 --> 00:07:25,800 AROUND. 132 00:07:25,800 --> 00:07:28,440 THE JAVASCRIPT CAN EXECUTE AND 133 00:07:28,440 --> 00:07:30,000 WE CAN GET A FULLER EXPERIENCE 134 00:07:30,000 --> 00:07:33,360 OF WHAT THAT WEB PAGE WAS LIKE 135 00:07:33,360 --> 00:07:36,280 THAN WITH JUST A STATIC IMAGE. 136 00:07:36,280 --> 00:07:41,400 SO A FEW YEARS AGO I WROTE AN 137 00:07:41,400 --> 00:07:43,040 ARTICLE ABOUT WEB ARCHIVING AND 138 00:07:43,040 --> 00:07:45,040 SOME OF OUR GROUP OTHER WORK IN 139 00:07:45,040 --> 00:07:45,640 THE AREA. 140 00:07:45,640 --> 00:07:47,520 I WON'T BE DISCUSSING THAT 141 00:07:47,520 --> 00:07:48,880 ARTICLE TODAY BUT IT WAS WRITTEN 142 00:07:48,880 --> 00:07:51,320 FOR A GENERAL ACADEMIC AUDIENCE 143 00:07:51,320 --> 00:07:53,080 SO I WANTED TO SHARE THE LINK 144 00:07:53,080 --> 00:07:56,560 WITH YOU HERE TODAY. 145 00:07:56,560 --> 00:07:58,920 SO, I'VE BEEN STUDYING WEB 146 00:07:58,920 --> 00:08:00,920 ARCHIVES FOR ABOUT 10 YEARS NOW 147 00:08:00,920 --> 00:08:03,120 AND WHEN I FIRST STARTED WE HAD 148 00:08:03,120 --> 00:08:05,240 TO SELL THIS IDEA THAT PEOPLE 149 00:08:05,240 --> 00:08:09,440 WOULD WANT TO LOOK AT "OLD WEB 150 00:08:09,440 --> 00:08:10,480 PAGES." 151 00:08:10,480 --> 00:08:12,480 RESEARCHERS AND SCHOLARS HAVE 152 00:08:12,480 --> 00:08:13,840 GOTTEN THAT POINT NOW BUT I 153 00:08:13,840 --> 00:08:17,600 WOULD LIKE TO INCLUDE THIS 154 00:08:17,600 --> 00:08:18,720 QUOTE. 155 00:08:18,720 --> 00:08:22,080 HE SAID YOU CANNOT STUDY THE 156 00:08:22,080 --> 00:08:27,480 1990s WITHOUT WEB ARCHIVES. 157 00:08:27,480 --> 00:08:31,080 I SHUDDER TO THINK THIS COULD BE 158 00:08:31,080 --> 00:08:32,240 HISTORY. 159 00:08:32,240 --> 00:08:35,320 BUT WHAT HE SAYS IS VERY TRUE. 160 00:08:35,320 --> 00:08:37,560 YOU MAY BE FAMILIAR WITH THE 161 00:08:37,560 --> 00:08:39,280 INTERNET ARCHIVE AND THEIR WAY 162 00:08:39,280 --> 00:08:40,760 BACK MACHINE. 163 00:08:40,760 --> 00:08:42,520 BUT THERE ARE SEVERAL OTHER 164 00:08:42,520 --> 00:08:43,280 PUBLIC ARCHIVES. 165 00:08:43,280 --> 00:08:45,080 SOME HAVE PARTICULAR GOALS FOR 166 00:08:45,080 --> 00:08:48,640 WHAT THEY CAPTURE SUCH AS THE UK 167 00:08:48,640 --> 00:08:59,680 WEB ARCHIVES AN ARE -- ARCHIVES. 168 00:09:05,920 --> 00:09:09,000 LET'S LOOK AT HOW WEB PAGES GET 169 00:09:09,000 --> 00:09:09,280 ARCHIVED. 170 00:09:09,280 --> 00:09:11,360 FIRST WE NEED TO LOOK AT HOW WEB 171 00:09:11,360 --> 00:09:12,760 PAGES ARE CONSTRUCTED. 172 00:09:12,760 --> 00:09:16,560 MOST WEB PAGES ARE MADE UP OF 173 00:09:16,560 --> 00:09:19,360 HTML SOURCE AND EMBEDDED 174 00:09:19,360 --> 00:09:22,640 RESOURCES LIKE CSS STYLE SHEETS 175 00:09:22,640 --> 00:09:26,280 AND JAVASCRIPTS. 176 00:09:26,280 --> 00:09:28,200 EACH MUST BE DOWNLOADED 177 00:09:28,200 --> 00:09:28,640 SEPARATELY. 178 00:09:28,640 --> 00:09:31,880 THE CSS AND JAVASCRIPT ARE 179 00:09:31,880 --> 00:09:33,120 DISPLAYED DIRECTLY ON THE PAGE 180 00:09:33,120 --> 00:09:35,800 BUT THEY CAN MODIFY HOW THE 181 00:09:35,800 --> 00:09:38,000 CONTENT APPEARS. 182 00:09:38,000 --> 00:09:40,320 AND THE EXAMPLE I'VE SHOWN HERE 183 00:09:40,320 --> 00:09:42,680 THE THE EMBEDDED RESOURCES SUCH 184 00:09:42,680 --> 00:09:44,680 AS IMAGES AND OTHER THINGS 185 00:09:44,680 --> 00:09:46,680 NEEDED TO BUILD THE PAGE ARE HOW 186 00:09:46,680 --> 00:09:48,560 LIGHTED IN YELLOW AND LINKS TO 187 00:09:48,560 --> 00:09:50,080 OTHER WEB PAGES ARE HIGHLIGHTED 188 00:09:50,080 --> 00:09:50,680 IN GREEN. 189 00:09:50,680 --> 00:09:53,480 AND BOTH OF THESE WILL BE 190 00:09:53,480 --> 00:09:55,920 NECESSARY WHEN WE THINK ABOUT 191 00:09:55,920 --> 00:09:56,600 ARCHIVING. 192 00:09:56,600 --> 00:09:59,360 AND THIS IS A NECESSARILY 193 00:09:59,360 --> 00:10:01,760 SIMPLIFIED VIEW OF WEB PAGES. 194 00:10:01,760 --> 00:10:03,960 MOST ARE A COMPLEX MIX OF 195 00:10:03,960 --> 00:10:05,560 RESOURCES THAT ARE DRIVEN 196 00:10:05,560 --> 00:10:08,520 LARGELY BY JAVASCRIPT AND 197 00:10:08,520 --> 00:10:10,200 RESOURCES REQUESTED AFTER YOUR 198 00:10:10,200 --> 00:10:12,120 PAGE HAS BEEN RENDERED IN YOUR 199 00:10:12,120 --> 00:10:15,960 WEB BROWSER. 200 00:10:15,960 --> 00:10:19,640 SO WEB ARCHIVES EMPLOY WEB 201 00:10:19,640 --> 00:10:19,920 CRAWLERS. 202 00:10:19,920 --> 00:10:22,760 AND THE PROCESS IS STARTED WITH 203 00:10:22,760 --> 00:10:26,000 A SEED LIST OF URLs THAT GO 204 00:10:26,000 --> 00:10:28,520 INTO A CUE OR A FRONTIER. 205 00:10:28,520 --> 00:10:32,400 AND THE CRAWLER GRABS A URL AND 206 00:10:32,400 --> 00:10:33,840 DOWNLOADS THE RESOURCE FROM THE 207 00:10:33,840 --> 00:10:37,360 WEB AND SAVES THE HTTP HEADERS 208 00:10:37,360 --> 00:10:38,640 AND CONTENT IN THE WORK FILE 209 00:10:38,640 --> 00:10:40,080 FORMAT. 210 00:10:40,080 --> 00:10:41,840 AND IN THE CRAWLER CHECKS IT 211 00:10:41,840 --> 00:10:45,480 LOOKS TO SEE IF THE RESOURCE CON 212 00:10:45,480 --> 00:10:50,120 TWINS URL EMBEDDED RESOURCES. 213 00:10:50,120 --> 00:10:52,480 AND IF SO THOSE ARE PLACED BACK 214 00:10:52,480 --> 00:10:54,200 ON TO THE FRONTIER ON SO THEY 215 00:10:54,200 --> 00:10:56,760 CAN BE DOWNLOADED AS WELL AND IF 216 00:10:56,760 --> 00:10:58,840 THE CRAWLER IS CONFIGURED TO 217 00:10:58,840 --> 00:11:01,200 CRAWL OUT LINKS MEANING THOSE 218 00:11:01,200 --> 00:11:03,200 PAGES THAT ARE LINKED TO FROM 219 00:11:03,200 --> 00:11:05,560 THE CURRENT PAGE THEN THE URL 220 00:11:05,560 --> 00:11:07,920 FOR EACH WILL ALSO BE PLACED ON 221 00:11:07,920 --> 00:11:09,360 TO THE FRONTIER. 222 00:11:09,360 --> 00:11:11,160 CRAWLERS TYPICALLY HAVE A DEPTH 223 00:11:11,160 --> 00:11:13,800 SETTING TO LIMIT THE NUMBER OF 224 00:11:13,800 --> 00:11:15,600 SUCCESSIVE LINKS THAT WILL BE 225 00:11:15,600 --> 00:11:21,520 FOLLOWED AND DOWNLOADED. 226 00:11:21,520 --> 00:11:25,440 SO WARC IS THE STANDARD FORMAT 227 00:11:25,440 --> 00:11:25,960 USED. 228 00:11:25,960 --> 00:11:29,080 AND THE RESOURCES THAT A WEB 229 00:11:29,080 --> 00:11:33,520 ARCHIVE CRAWLER DOWNLOADS ARE 230 00:11:33,520 --> 00:11:34,960 STORED. 231 00:11:34,960 --> 00:11:38,520 THE KEY IS THAT IN THE WARC FILE 232 00:11:38,520 --> 00:11:41,080 THE CRAWLER RESTORES THE 233 00:11:41,080 --> 00:11:42,440 REQUESTS AND RESPONSES INCLUDING 234 00:11:42,440 --> 00:11:44,760 THE RESOURCE CONTENT FOR EVERY 235 00:11:44,760 --> 00:11:46,200 RESOURCE THAT YOU NEED TO BUILD 236 00:11:46,200 --> 00:11:48,320 A WEB PAGE AND THESE WERE THE 237 00:11:48,320 --> 00:11:51,080 FILES THAT ARE ARE ACCESSED WHEN 238 00:11:51,080 --> 00:11:54,080 IT'S TIME TO REPLAY. 239 00:11:54,080 --> 00:11:56,800 SO WHEN IT IS REPLAYED ONLY THE 240 00:11:56,800 --> 00:11:59,240 RESOURCES STORED IN WARC FILES 241 00:11:59,240 --> 00:12:01,160 ARE USED AND NOTHING FROM THE 242 00:12:01,160 --> 00:12:03,120 LIVE WEB IS ACCESSED. 243 00:12:03,120 --> 00:12:05,920 THIS IS SHOWING THAT THE HTML 244 00:12:05,920 --> 00:12:09,000 SOURCE CODE AND THE EMBEDDED 245 00:12:09,000 --> 00:12:10,560 IMAGE ARE BOTH STORED IN THE 246 00:12:10,560 --> 00:12:15,360 SAME WARC FILE. 247 00:12:15,360 --> 00:12:18,160 MANY WEB ARCHIVES SET THEIR OWN 248 00:12:18,160 --> 00:12:20,040 SEED LIST BASED ON THEIR 249 00:12:20,040 --> 00:12:22,240 ORGANIZATION'S PRIORITIES. 250 00:12:22,240 --> 00:12:25,240 OTHER ARCHIVES ALLOW USERS TO 251 00:12:25,240 --> 00:12:26,480 SPEAK FEW ITEMS. 252 00:12:26,480 --> 00:12:28,680 WHICH WE CALL ON DEMAND 253 00:12:28,680 --> 00:12:32,880 ARCHIVING. 254 00:12:32,880 --> 00:12:35,520 THIS IS THE INTERNET ARCHIVE 255 00:12:35,520 --> 00:12:37,280 SUBSCRIPTION SERVICE FOR 256 00:12:37,280 --> 00:12:38,640 DEVELOPING WEB ARCHIVE 257 00:12:38,640 --> 00:12:39,440 COLLECTIONS. 258 00:12:39,440 --> 00:12:42,760 IT'S LARGELY USED BY MUSEUMS AND 259 00:12:42,760 --> 00:12:44,240 LIBRARIES AND GOVERNMENT 260 00:12:44,240 --> 00:12:46,520 AGENCIES INCLUDING NNLM. 261 00:12:46,520 --> 00:12:49,080 THE COLLECTING ORGANIZATION 262 00:12:49,080 --> 00:12:50,440 CONTROLS WHAT GETS CRAWLED AND 263 00:12:50,440 --> 00:12:53,280 WHO HAS ACCESS TO THE ARCHIVED 264 00:12:53,280 --> 00:12:56,280 WEB PAGES WHICH WE CALL 265 00:12:56,280 --> 00:12:56,520 MOMENTOS. 266 00:12:56,520 --> 00:12:59,040 AND PUBLIC MOMENTOS ARE ALSO 267 00:12:59,040 --> 00:13:00,840 ADDED TO THE INTERNET ARCHIVE 268 00:13:00,840 --> 00:13:04,480 HOLDINGS. 269 00:13:04,480 --> 00:13:06,560 IT HAS A SAVE PAGE NOW FEATURE 270 00:13:06,560 --> 00:13:09,000 THAT ALLOWS ANY USER TO REQUEST 271 00:13:09,000 --> 00:13:11,480 THAT THEIR WEB CRAWLER ARCHIVE 272 00:13:11,480 --> 00:13:13,360 WEB PAGE. 273 00:13:13,360 --> 00:13:16,000 THE MEMENTO WILL BE AVAILABLE IN 274 00:13:16,000 --> 00:13:18,040 THE PUBLIC WAY BACK MACHINE AND 275 00:13:18,040 --> 00:13:20,640 THE INTERNET ARCHIVE FREQUENTLY 276 00:13:20,640 --> 00:13:24,040 HAS A MOTTO THAT SAYS IF YOU SEE 277 00:13:24,040 --> 00:13:25,040 SOMETHING SAVE SOMETHING. 278 00:13:25,040 --> 00:13:26,840 SO THEY ENCOURAGE THE USE OF 279 00:13:26,840 --> 00:13:31,240 THIS FEATURE. 280 00:13:31,240 --> 00:13:34,960 ARCHIVE.TODAY IS SIMILAR. 281 00:13:34,960 --> 00:13:37,640 IN THAT ANYONE CAN SUBMIT A 282 00:13:37,640 --> 00:13:42,040 REQUEST FOR A PAGE TO BE CRAWLED 283 00:13:42,040 --> 00:13:43,600 AND THAT IS MADE AVAILABLE IN 284 00:13:43,600 --> 00:13:45,400 THEIR PUBLIC ARCHIVE. 285 00:13:45,400 --> 00:13:49,320 AND THEN CONFER WHICH I'LL TALK 286 00:13:49,320 --> 00:13:52,520 ABOUT NEXT ALLOWS REGISTERED 287 00:13:52,520 --> 00:13:56,360 USERS TO COLLECT WEB PAGES. 288 00:13:56,360 --> 00:14:00,120 WHEN SPECIFIES THE URL TO 289 00:14:00,120 --> 00:14:02,560 CAPTURE USERS CAN ALSO SPECIFY 290 00:14:02,560 --> 00:14:04,040 WHICH WEB BROWSER THEY WANT TO 291 00:14:04,040 --> 00:14:07,040 USE AND USERS GET FIVE GIGABYTES 292 00:14:07,040 --> 00:14:09,160 OF FREE STORAGE AND THEY CONTROL 293 00:14:09,160 --> 00:14:11,200 THE ACCESS TO THE MOMENTOS BUT 294 00:14:11,200 --> 00:14:13,520 THERE IS NO PUBLIC SEARCH OR 295 00:14:13,520 --> 00:14:16,120 LOOK UP MECHANISM AS THERE IS 296 00:14:16,120 --> 00:14:20,400 WITH OTHER WEB ARCHIVES. 297 00:14:20,400 --> 00:14:23,800 CONFER STARTED OUT AS THE WEB 298 00:14:23,800 --> 00:14:26,040 RECORDER PROJECT AND SOME PEOPLE 299 00:14:26,040 --> 00:14:28,840 NIGHT KNOW THE TOOL BY EITHER OF 300 00:14:28,840 --> 00:14:30,520 THOSE NAMES AND THERE ARE SOME 301 00:14:30,520 --> 00:14:32,160 UNIQUE THINGS ABOUT IT. 302 00:14:32,160 --> 00:14:38,280 CONFER USES A WEB BROWSER BASED 303 00:14:38,280 --> 00:14:39,800 CRAWLER. 304 00:14:39,800 --> 00:14:42,080 ANY JAVASCRIPT ON THE PAGE CAN 305 00:14:42,080 --> 00:14:43,640 BE EXECUTED BEFORE THE PAGE IS 306 00:14:43,640 --> 00:14:45,360 ARCHIVED AND THIS IS IMPORTANT 307 00:14:45,360 --> 00:14:47,320 BECAUSE JAVASCRIPT CAN CHANGE 308 00:14:47,320 --> 00:14:49,200 THE LAYOUT OF THE PAGE AND CAN 309 00:14:49,200 --> 00:14:50,760 ALSO REQUEST ADDITIONAL 310 00:14:50,760 --> 00:14:53,240 RESOURCES THAT MIGHT BE MISSED 311 00:14:53,240 --> 00:14:56,000 BY A TRADITIONAL WEB CRAWLER. 312 00:14:56,000 --> 00:14:58,200 SO EVEN THOUGH THIS CAN MORE 313 00:14:58,200 --> 00:15:00,640 FAITHFULLY CAPTURE DYNAMIC WEB 314 00:15:00,640 --> 00:15:03,040 PAGES MOST CRAWLERS DON'T USE 315 00:15:03,040 --> 00:15:04,600 THIS TECHNIQUE BECAUSE IT TAKES 316 00:15:04,600 --> 00:15:05,280 MUCH LONGER. 317 00:15:05,280 --> 00:15:07,880 THERE IS A TRADE-OFF BETWEEN 318 00:15:07,880 --> 00:15:09,680 SPEED AND FIDELITY. 319 00:15:09,680 --> 00:15:13,400 WITH CONFER THE ARCHIVING IS 320 00:15:13,400 --> 00:15:15,360 DONE WHILE YOU VISIT THE PAGE. 321 00:15:15,360 --> 00:15:16,720 YOU CAN INTERACT WITH THE PAGE 322 00:15:16,720 --> 00:15:18,840 AND SOMETIMES THAT CAUSES MORE 323 00:15:18,840 --> 00:15:20,360 RESOURCES TO BE LOADED WHICH 324 00:15:20,360 --> 00:15:22,800 THEN THEY CAN BE ARCHIVED AS 325 00:15:22,800 --> 00:15:23,000 WELL. 326 00:15:23,000 --> 00:15:25,160 AND YOU CAN ALSO ARCHIVE PAGES 327 00:15:25,160 --> 00:15:26,720 WHERE YOU WOULD NEED A LOG IN TO 328 00:15:26,720 --> 00:15:28,600 ACCESS. 329 00:15:28,600 --> 00:15:30,920 SO I WANT TO MENTION SOME OF OUR 330 00:15:30,920 --> 00:15:33,800 PREVIOUS WORK RELATED TO THIS. 331 00:15:33,800 --> 00:15:36,240 BACK IN 2012 WITH FUNDING FROM 332 00:15:36,240 --> 00:15:39,320 THE NEH WE DEVELOPED A TOOL 333 00:15:39,320 --> 00:15:41,640 CALLED WARK CREATE. 334 00:15:41,640 --> 00:15:44,240 THAT PROVIDED A BROWSER BASE 335 00:15:44,240 --> 00:15:46,480 ARCHIVING USING A BROWSER 336 00:15:46,480 --> 00:15:47,800 EXTENSION AND THIS IS 337 00:15:47,800 --> 00:15:49,600 ACKNOWLEDGED AS AN INSPIRATION 338 00:15:49,600 --> 00:15:52,400 FOR SOME OF THESE LATER EFFORTS 339 00:15:52,400 --> 00:15:56,480 SUCH AS WEB REPORTER AND CONFER. 340 00:15:56,480 --> 00:15:58,960 SO NOW THAT WE'VE LOOKED BRIEFLY 341 00:15:58,960 --> 00:16:01,040 AS HOW WEB PAGES ARE ARCHIVED 342 00:16:01,040 --> 00:16:03,040 THE OTHER END IS HOW WE CAN 343 00:16:03,040 --> 00:16:05,600 ACCESS THE WEB PAGES THAT HAVE 344 00:16:05,600 --> 00:16:06,840 ALREADY BEEN ARCHIVED. 345 00:16:06,840 --> 00:16:10,160 SO WEB ARCHIVES ARE NOT LIKE 346 00:16:10,160 --> 00:16:10,760 GOOGLE. 347 00:16:10,760 --> 00:16:13,400 YOU CANNOT OFTEN DO FREE TEXT TO 348 00:16:13,400 --> 00:16:14,160 SEARCH. 349 00:16:14,160 --> 00:16:17,400 MOST ARE INDEXED BY URL AND DATE 350 00:16:17,400 --> 00:16:17,840 TIME. 351 00:16:17,840 --> 00:16:19,440 THE TIME AND DATE THAT THE PAGE 352 00:16:19,440 --> 00:16:21,120 WAS CAPTURED. 353 00:16:21,120 --> 00:16:23,720 SO THE FIRST INPUT IS THE URL 354 00:16:23,720 --> 00:16:25,440 AND FROM THERE YOU CAN USUALLY 355 00:16:25,440 --> 00:16:27,680 GET A LIST OF DAYTIMES WHEN THAT 356 00:16:27,680 --> 00:16:29,000 WEB PAGE WAS CAPTURED. 357 00:16:29,000 --> 00:16:31,000 SO HERE I'M SHOWING THE QUERIES 358 00:16:31,000 --> 00:16:35,080 FOR CAPTURES OR MOMENTOS OF A 359 00:16:35,080 --> 00:16:37,360 HISTORY OF MEDICINE DIVISIONS 360 00:16:37,360 --> 00:16:39,440 MAIN WEB PAGE FROM THE WAY BACK 361 00:16:39,440 --> 00:16:44,000 MACHINE AND FROM ARCHIVE.TODAY. 362 00:16:44,000 --> 00:16:47,680 SO THE WAY BACK MACHINE REPORTS 363 00:16:47,680 --> 00:16:49,440 ALMOST 1500 MOMENTOS OF THE WEB 364 00:16:49,440 --> 00:16:55,160 PAGE BETWEEN 1998 AND 2022. 365 00:16:55,160 --> 00:16:59,040 ARCHIVE.TODD ONLY HAS TWO, ONE 366 00:16:59,040 --> 00:17:01,680 FROM 2012 AND ONE FROM 2014. 367 00:17:01,680 --> 00:17:04,560 REMEMBER THAT ARCHIVE.TODD IS AN 368 00:17:04,560 --> 00:17:07,800 ON DEMAND ARCHIVING SERVICE SO 369 00:17:07,800 --> 00:17:10,240 IT DOESN'T DO CRAWLS ON ITS OWN 370 00:17:10,240 --> 00:17:12,360 SO IF YOU WOULD LIKE TO ADD A 371 00:17:12,360 --> 00:17:16,240 MEMENTO TO THE ARCHIVE.TODD 372 00:17:16,240 --> 00:17:19,520 ARCHIVE YOU CAN * SUBMIT A 373 00:17:19,520 --> 00:17:22,520 REQUEST THAT IT BE CRAWLED. 374 00:17:22,520 --> 00:17:24,880 SO BECAUSE THE WAY BACK MACHINE 375 00:17:24,880 --> 00:17:26,960 OFTEN HAS THOUSANDS OF MOMENTOS 376 00:17:26,960 --> 00:17:29,360 FOR A WEB PAGE THEY EMPLOY A 377 00:17:29,360 --> 00:17:31,680 CALENDAR BASED VIEW FOR ACCESS. 378 00:17:31,680 --> 00:17:34,240 SO FIRST YOU SELECT THE YEAR AND 379 00:17:34,240 --> 00:17:37,360 IN THIS CASE I'VE SELECTED 1999 380 00:17:37,360 --> 00:17:38,880 AND THEN ON THE CALENDAR THERE 381 00:17:38,880 --> 00:17:41,800 IS A BUMBLE FOR EACH DAY THE 382 00:17:41,800 --> 00:17:44,560 PAGE WAS ARCHIVED. 383 00:17:44,560 --> 00:17:46,800 LARGER BUBBLES MEAN THERE WERE 384 00:17:46,800 --> 00:17:50,120 MORE MOMENTOS CAPTURED ON THAT 385 00:17:50,120 --> 00:17:51,960 DAY. 386 00:17:51,960 --> 00:17:55,080 MARCH 2, 1999 I'M GOING TO DROP 387 00:17:55,080 --> 00:17:57,200 DOWN AND SELECT A PARTICULAR 388 00:17:57,200 --> 00:17:59,480 DAYTIME AND THESE TIMES ARE IN 389 00:17:59,480 --> 00:18:05,360 24 HOUR FORMAT BASED ON GMT. 390 00:18:05,360 --> 00:18:13,600 SO I CLICKED ON 10:07. 391 00:18:13,600 --> 00:18:15,960 AND SO HERE IS WHAT THE HISTORY 392 00:18:15,960 --> 00:18:17,520 OF MEDICINE DIVISION WEB PAGE 393 00:18:17,520 --> 00:18:20,520 LOOKED LIKE IN MARCH 1999. 394 00:18:20,520 --> 00:18:23,040 SO WHILE WE'RE HERE LET ME 395 00:18:23,040 --> 00:18:29,000 UNPACK THIS URL AT THE BOTTOM. 396 00:18:29,000 --> 00:18:31,880 THIS IS THE PREFIX FOR ALL 397 00:18:31,880 --> 00:18:33,600 INTERNET ARCHIVE WAY BACK 398 00:18:33,600 --> 00:18:35,800 MACHINE MOMENTOS. 399 00:18:35,800 --> 00:18:38,560 THE NEXT 16 DIGITS REPRESENT THE 400 00:18:38,560 --> 00:18:41,680 DAYTIME. 401 00:18:41,680 --> 00:18:44,560 1999, 03 FOR MARCH AND 02 AND 402 00:18:44,560 --> 00:18:49,360 THEN THE LAST SIX DIGITS ARE THE 403 00:18:49,360 --> 00:18:50,200 TIME. 404 00:18:50,200 --> 00:18:53,200 100723 AND AT THE END IS THE URL 405 00:18:53,200 --> 00:18:54,840 OF THAT ORIGINAL RESOURCE. 406 00:18:54,840 --> 00:18:57,360 SO IF YOU WANTED TO SHARE THIS 407 00:18:57,360 --> 00:18:59,920 PARTICULAR MEMENTO THIS WHOLE 408 00:18:59,920 --> 00:19:01,960 URL IS WHAT YOU WOULD USE TO 409 00:19:01,960 --> 00:19:03,520 ACCESS THAT FROM THE WAY BACK 410 00:19:03,520 --> 00:19:08,560 MACHINE. 411 00:19:08,560 --> 00:19:11,320 BUT WHAT IF THE ARCHIVE YOU 412 00:19:11,320 --> 00:19:12,560 TRIED DOESN'T HAVE THE WEB PAGE 413 00:19:12,560 --> 00:19:13,600 AT THE TIME THAT YOU'RE 414 00:19:13,600 --> 00:19:14,600 INTERESTED IN. 415 00:19:14,600 --> 00:19:16,360 WHAT IF YOU WANTED A MEMENTO OF 416 00:19:16,360 --> 00:19:20,160 THE HISTORY OF MEDICINE DIVISION 417 00:19:20,160 --> 00:19:23,160 FROM 2020 AND YOU TRIED 418 00:19:23,160 --> 00:19:24,400 ARCHIVE.TODD. 419 00:19:24,400 --> 00:19:27,040 THEY ONLY HAVE 2012 AND 2014 SO 420 00:19:27,040 --> 00:19:30,240 YOU CAN TRY THE QUERY AT ANOTHER 421 00:19:30,240 --> 00:19:30,720 ARCHIVE. 422 00:19:30,720 --> 00:19:33,680 BUT WITH MORE AND MORE WEB 423 00:19:33,680 --> 00:19:37,480 ARCHIVES THIS PROCESS CAN GET 424 00:19:37,480 --> 00:19:42,760 TEDIOUS SO TO HELP WE DEVELOPED 425 00:19:42,760 --> 00:19:45,320 THE MEMENTO PROTOCOL. 426 00:19:45,320 --> 00:19:47,640 THIS IS AN EXTENSION THAT AMONG 427 00:19:47,640 --> 00:19:49,760 OTHER THINGS ALLOWS A CLIENT TO 428 00:19:49,760 --> 00:19:54,080 OBTAIN A TIME MAP OR LIST OF ALL 429 00:19:54,080 --> 00:19:56,400 MOMENTOS FROM ALL URLs. 430 00:19:56,400 --> 00:19:58,520 AND THERE ARE SEVERAL BROWSER 431 00:19:58,520 --> 00:20:00,640 EXTENSIONS BUILT ON TOP OF THE 432 00:20:00,640 --> 00:20:02,280 PRO MOAN TOE PROTOCOL THAT YOU 433 00:20:02,280 --> 00:20:06,800 CAN USE IT WITHOUT A BROWSER 434 00:20:06,800 --> 00:20:07,240 EXTENSION. 435 00:20:07,240 --> 00:20:09,360 SO I'M SHOWING YOU AN EXAMPLE 436 00:20:09,360 --> 00:20:11,280 HERE FOR QUERYING FOR THE 437 00:20:11,280 --> 00:20:13,320 HISTORY OF MEDICINE DIVISION 438 00:20:13,320 --> 00:20:16,640 PAGE FROM JUNE 2020. 439 00:20:16,640 --> 00:20:22,120 AND THE TOP THREE RESULTS COME 440 00:20:22,120 --> 00:20:28,240 FROM THE PORTUGUESE ARCHIVE. 441 00:20:28,240 --> 00:20:33,360 AND THEY TELL YOU EXACTLY HOW 442 00:20:33,360 --> 00:20:34,800 DIFFERENT THE VERSION THAT THEY 443 00:20:34,800 --> 00:20:36,480 HAVE IS FROM THE PAGE THAT YOU 444 00:20:36,480 --> 00:20:41,200 REQUESTED. 445 00:20:41,200 --> 00:20:43,320 SO OUR RESEARCH GROUP HAS BUILT 446 00:20:43,320 --> 00:20:49,920 A MEMENTO AGGRAVATOR. 447 00:20:49,920 --> 00:20:53,760 ALL OF THE MOMENTOS FOR THAT URL 448 00:20:53,760 --> 00:21:01,400 AND THIS IS BUILT AS AN A PI. 449 00:21:01,400 --> 00:21:03,800 THE EXAMPLE HERE SHOWS A QUERY 450 00:21:03,800 --> 00:21:05,920 FOR ALL OF THE MOMENTOS FOR THE 451 00:21:05,920 --> 00:21:07,200 HISTORIC OF MEDICINE DIVISION 452 00:21:07,200 --> 00:21:07,720 PAGE. 453 00:21:07,720 --> 00:21:10,360 SO I'LL MAKE THIS LARGER. 454 00:21:10,360 --> 00:21:11,960 YOU CAN SEE IN THE SHORTENED 455 00:21:11,960 --> 00:21:13,520 LIST THAT THERE ARE MOMENTOS 456 00:21:13,520 --> 00:21:17,640 FROM BOTH THE INTERNET ARCHIVE, 457 00:21:17,640 --> 00:21:20,840 WEB.ARCHIVE. ORG AND THE 458 00:21:20,840 --> 00:21:22,320 LIBRARY OF CONGRESS. 459 00:21:22,320 --> 00:21:24,920 AND YOU CAN ALSO SEE THAT THE 460 00:21:24,920 --> 00:21:27,520 FORMAT OF THE LIBRARY OF 461 00:21:27,520 --> 00:21:29,320 CONGRESS ARCHIVE IS SIMILAR TO 462 00:21:29,320 --> 00:21:31,600 THAT OF THE INTERNET ARCHIVE. 463 00:21:31,600 --> 00:21:34,880 NOW NOT ALL ARCHIVES USE THIS 464 00:21:34,880 --> 00:21:36,000 FORMAT. 465 00:21:36,000 --> 00:21:40,320 FOR THOSE THAT DO IT CAN PROVIDE 466 00:21:40,320 --> 00:21:42,000 HELPFUL INFORMATION ON THE 467 00:21:42,000 --> 00:21:43,880 MEMENTO THAT YOU ARE VIEWING. 468 00:21:43,880 --> 00:21:48,120 ONE TOOL THAT WE HAVE TO HELP 469 00:21:48,120 --> 00:21:50,000 USERS DISCOVER RELEVANT MOMENTOS 470 00:21:50,000 --> 00:21:53,360 IS TO CREATE CURATED COLLECTIONS 471 00:21:53,360 --> 00:21:55,400 BASED ON A TOPIC. 472 00:21:55,400 --> 00:21:59,600 THEY CAN PROVIDE SEED URLs AND 473 00:21:59,600 --> 00:22:03,360 PROVIDE ADDITION METADATA. 474 00:22:03,360 --> 00:22:07,000 THE EXAMPLE I'M SHOWING HERE IS 475 00:22:07,000 --> 00:22:08,920 NLMs GLOBAL HEALTH EVENTS WEB 476 00:22:08,920 --> 00:22:11,600 ARCHIVE COLLECTION WHICH IS 477 00:22:11,600 --> 00:22:12,880 HOSTED AT ARCHIVE. 478 00:22:12,880 --> 00:22:16,440 THIS IS STARTED IN 2014 AND 479 00:22:16,440 --> 00:22:18,520 CONTAINS OVER 19,000 SEED 480 00:22:18,520 --> 00:22:19,960 URLs. 481 00:22:19,960 --> 00:22:22,720 THE MEMBERS OF THE WEB 482 00:22:22,720 --> 00:22:26,040 COLLECTING SPEND A LOT OF TIME 483 00:22:26,040 --> 00:22:27,520 EVALUATING SEEDS TO BE ADDED TO 484 00:22:27,520 --> 00:22:29,280 THIS COLLECTION. 485 00:22:29,280 --> 00:22:32,040 ONE OF OUR GROUP'S PROJECTS 486 00:22:32,040 --> 00:22:34,480 INVOLVES SEEING IF WE CAN MINE 487 00:22:34,480 --> 00:22:36,920 QUALITY SEEDS FROM SOCIAL MEDIA 488 00:22:36,920 --> 00:22:39,480 POST TO HELP BOOT STRAP SOME OF 489 00:22:39,480 --> 00:22:43,120 THESE TASKS AND THIS PROJECT WAS 490 00:22:43,120 --> 00:22:46,120 GREATLY HELPED WITH DISCUSSIONS 491 00:22:46,120 --> 00:22:47,800 BY CHRISTIE MOFFETT. 492 00:22:47,800 --> 00:22:50,040 WE LOOKED AT LINKS THAT WERE 493 00:22:50,040 --> 00:22:52,240 POSTED IN SOCIAL MEDIA ABOUT 494 00:22:52,240 --> 00:22:54,120 SPECIFIC TOPICS AND WE CONSIDER 495 00:22:54,120 --> 00:22:56,720 THE DOMAIN OF THE LINK, THE 496 00:22:56,720 --> 00:22:58,080 POPULARITY OF THE POST AUTHOR, 497 00:22:58,080 --> 00:23:00,840 IF IT WAS A LOCAL MEDIA SOURCE 498 00:23:00,840 --> 00:23:03,920 AND SEVERAL OTHER FACTORS. 499 00:23:03,920 --> 00:23:05,480 ALTHOUGH THIS IS A PROJECT THAT 500 00:23:05,480 --> 00:23:07,040 FOCUSED ON THE COLLECTION SIDE 501 00:23:07,040 --> 00:23:09,600 MOST OF OUR OTHER RESEARCH 502 00:23:09,600 --> 00:23:11,280 FOCUSES ON WHAT HAPPENS ONCE A 503 00:23:11,280 --> 00:23:16,120 WEB PAGE HAS BEEN ARCHIVED. 504 00:23:16,120 --> 00:23:20,840 AS I MENTIONED EARLIER ARCHIVE 505 00:23:20,840 --> 00:23:26,280 IT IS A SERVICE USED BY MANY 506 00:23:26,280 --> 00:23:29,560 ORGANIZATIONS. 507 00:23:29,560 --> 00:23:31,480 THIS ALLOWS A BIT MORE FILTERING 508 00:23:31,480 --> 00:23:32,360 AND BROWSING. 509 00:23:32,360 --> 00:23:35,120 BUT COLLECTIONS CAN STILL GROW 510 00:23:35,120 --> 00:23:37,240 QUITE LARGE SO BROWSING AT TIMES 511 00:23:37,240 --> 00:23:42,200 CAN STILL BE DIFFICULT. 512 00:23:42,200 --> 00:23:45,120 ONE THING THAT CAN MAKE LARGE 513 00:23:45,120 --> 00:23:47,880 COLLECTIONS HARD TO NAVIGATE IS 514 00:23:47,880 --> 00:23:50,200 THAT FOR EVERY SEED URL THERE 515 00:23:50,200 --> 00:23:54,040 ARE ONE OR MORE MOMENTOS. 516 00:23:54,040 --> 00:23:56,720 CURATORS CAN CHOOSE TO HAVE THE 517 00:23:56,720 --> 00:23:58,600 SEEDS CRAWL AT CERTAIN 518 00:23:58,600 --> 00:23:59,360 INTERVALS. 519 00:23:59,360 --> 00:24:01,640 SO EVEN IF THE SEED LIST DOESN'T 520 00:24:01,640 --> 00:24:03,320 GROW THE NUMBER OF MOMENTOS IN 521 00:24:03,320 --> 00:24:06,720 THE COLLECTION CAN STILL GROW. 522 00:24:06,720 --> 00:24:08,720 IN 2018 WE DID A STUDY OF THE 523 00:24:08,720 --> 00:24:11,520 SIZE OF OUR COLLECTIONS IN THE 524 00:24:11,520 --> 00:24:14,120 TERMS OF SEEDS AND MOMENTOS AND 525 00:24:14,120 --> 00:24:16,200 HOW THEY GROW OVER TIME. 526 00:24:16,200 --> 00:24:18,440 WE IDENTIFIED NINE DIFFERENT 527 00:24:18,440 --> 00:24:25,240 GROWTH PATTERNS. 528 00:24:25,240 --> 00:24:29,160 THE EXAMPLE HERE IS WHERE ALMOST 529 00:24:29,160 --> 00:24:31,480 80% OF THE SEED URLs WERE 530 00:24:31,480 --> 00:24:33,640 ADDED FAIRLY EARLY IN THE 531 00:24:33,640 --> 00:24:35,000 COLLECTION SLIDE TIME AND A 532 00:24:35,000 --> 00:24:37,360 CRAWLING STRATEGY THAT HAD 533 00:24:37,360 --> 00:24:39,120 MOMENTOS BEING ADDED 534 00:24:39,120 --> 00:24:39,560 CONTINUOUSLY. 535 00:24:39,560 --> 00:24:43,240 THE RED LINE SO WHEN THERE IS A 536 00:24:43,240 --> 00:24:44,520 HORIZONTAL GREEN LINE THAT MEANS 537 00:24:44,520 --> 00:24:46,400 THAT NOTHING IS BEING ADDED TO 538 00:24:46,400 --> 00:24:48,200 THE SEED LIST AND AT THE SAME 539 00:24:48,200 --> 00:24:49,840 TIME WE SEE THE RED LINE 540 00:24:49,840 --> 00:24:52,400 REPRESENTING THE NUMBER OF 541 00:24:52,400 --> 00:24:55,360 MOMENTOS GROWING LINEARLY. 542 00:24:55,360 --> 00:24:56,920 THOSE SEEDS WERE CONTINUING TO 543 00:24:56,920 --> 00:24:59,040 BE CRAWLED AND NEW MOMENTOS 544 00:24:59,040 --> 00:25:01,640 ADDED. 545 00:25:01,640 --> 00:25:03,840 THE INTERNET ARCHIVES WAY BACK 546 00:25:03,840 --> 00:25:05,840 MACHINE IS TOO LARGE TO OFFER 547 00:25:05,840 --> 00:25:07,960 FULL TEXT SEARCH ON THE ENTIRE 548 00:25:07,960 --> 00:25:10,320 ARCHIVE BUT SOME SMALLER 549 00:25:10,320 --> 00:25:13,120 ARCHIVES INCLUDING ARCHIVE IT DO 550 00:25:13,120 --> 00:25:14,960 OFFER FULL TEXT SEARCH. 551 00:25:14,960 --> 00:25:16,840 SEARCHING THROUGH TIME CAN BE 552 00:25:16,840 --> 00:25:17,840 TRICE. 553 00:25:17,840 --> 00:25:21,760 WHAT IF THE SEARCH TERM APPEARS 554 00:25:21,760 --> 00:25:24,240 IN ONLY SOME ITEMS. 555 00:25:24,240 --> 00:25:26,560 HOW SHOULD THAT BE INDICATED? 556 00:25:26,560 --> 00:25:30,520 IN THIS EXAMPLE I SEARCH FOR THE 557 00:25:30,520 --> 00:25:32,280 PHRASE MASK REQUIRED. 558 00:25:32,280 --> 00:25:34,760 ONE OF THE RESULTS WAS THE HOME 559 00:25:34,760 --> 00:25:37,200 PAGE FOR THE NEW YORK STATE'S 560 00:25:37,200 --> 00:25:37,720 COVID RESPONSE. 561 00:25:37,720 --> 00:25:40,280 THE MEMENTO RETURNED IN THE 562 00:25:40,280 --> 00:25:45,360 SEARCH RUNS IS FROM DECEMBER 23, 563 00:25:45,360 --> 00:25:47,640 2021. 564 00:25:47,640 --> 00:25:50,520 BUT LOOKING AT OTHER MOMENTOS 565 00:25:50,520 --> 00:25:52,440 AROUND THAT DATE REVEALS THAT 566 00:25:52,440 --> 00:25:54,160 THE PHRASE FIRST APPEARED IN THE 567 00:25:54,160 --> 00:25:57,200 MOMENTOS OF THE WEB PAGE ON 568 00:25:57,200 --> 00:26:00,800 DECEMBER 16 AND FURTHER ALTHOUGH 569 00:26:00,800 --> 00:26:02,440 THE PHRASE NO LONGER APPEARS IN 570 00:26:02,440 --> 00:26:05,600 THE HEADLINE ON DECEMBER 30 IT'S 571 00:26:05,600 --> 00:26:08,040 STILL PRESENT LOWER ON THE WEB 572 00:26:08,040 --> 00:26:10,440 PAGE SO WHICH IS THE MOST 573 00:26:10,440 --> 00:26:12,520 APPROPRIATE DATE TIME TO RETURN. 574 00:26:12,520 --> 00:26:14,880 ONE OF OUR MASTER STUDENTS IS 575 00:26:14,880 --> 00:26:16,960 EXPLORING DIFFERENT WAYS TO 576 00:26:16,960 --> 00:26:18,440 PRESENT SEARCH RESULTS FOR WEB 577 00:26:18,440 --> 00:26:19,960 ARCHIVES INCLUDING ADDING THE 578 00:26:19,960 --> 00:26:22,440 ABILITY TO SEARCH FOR ADDED OR 579 00:26:22,440 --> 00:26:24,000 DELETED TERMS OR PHRASES AND 580 00:26:24,000 --> 00:26:26,040 THIS WOULD ALLOW RESEARCHERS TO 581 00:26:26,040 --> 00:26:27,600 SEARCH FOR WHEN PAGES HAVE 582 00:26:27,600 --> 00:26:29,360 CHANGED. 583 00:26:29,360 --> 00:26:32,040 SO I'M SHOWING A PROTOTYPE OF A 584 00:26:32,040 --> 00:26:34,400 SEARCH INTERFACE THAT SHE 585 00:26:34,400 --> 00:26:35,760 DEVELOPED THAT ANIMATES THE 586 00:26:35,760 --> 00:26:36,480 SEARCH CHANGE. 587 00:26:36,480 --> 00:26:38,720 WE SEARCH FOR THE ADDITION OF 588 00:26:38,720 --> 00:26:40,440 THE PHRASE NONCLOTH AND THIS 589 00:26:40,440 --> 00:26:46,200 EXAMPLE REFLECTS UPDATED 590 00:26:46,200 --> 00:26:51,240 GUIDANCE. 591 00:26:51,240 --> 00:26:53,200 ANOTHER WAY TO FILTER DOWN A 592 00:26:53,200 --> 00:26:54,960 COLLECTION IS TO USE FACETS. 593 00:26:54,960 --> 00:26:57,640 THESE ARE ALONG THE SIDE. 594 00:26:57,640 --> 00:26:59,800 IN ARCHIVE IT THIS IS PROVIDED 595 00:26:59,800 --> 00:27:01,920 IN THE PANEL ON THE LEFT HERE. 596 00:27:01,920 --> 00:27:07,400 7 HOWEVER THE PROBLEM WITH 597 00:27:07,400 --> 00:27:10,040 FACETS IS THEY DEPEND ON CURATED 598 00:27:10,040 --> 00:27:13,720 DATA. WE FOUND THAT -- 599 00:27:13,720 --> 00:27:16,520 TYPICALLY THE MORE SEEDS A 600 00:27:16,520 --> 00:27:19,080 COLLECTION HAS THE FEWER 601 00:27:19,080 --> 00:27:20,680 METADATA FIELDS IT HAS AND THAT 602 00:27:20,680 --> 00:27:22,480 IS WHAT WE FOUND IN THIS GRAPH 603 00:27:22,480 --> 00:27:24,000 ON THE SIDE. 604 00:27:24,000 --> 00:27:28,640 THE SEEDS ARE ON THE X AXIS AND 605 00:27:28,640 --> 00:27:31,720 THE METADATA ON THE Y. 606 00:27:31,720 --> 00:27:33,800 THIS MAKES SENSE BECAUSE IT CAN 607 00:27:33,800 --> 00:27:40,280 BE A TIME-CONSUMING PROCESS. 608 00:27:40,280 --> 00:27:44,480 HOWEVER I HAVE TO RECOGNIZE THE 609 00:27:44,480 --> 00:27:45,240 WORKING GROUP. 610 00:27:45,240 --> 00:27:47,160 BECAUSE HERE IS A COLLECTION 611 00:27:47,160 --> 00:27:49,800 WITH THOUSANDS OF SEEDS THAT 612 00:27:49,800 --> 00:27:52,480 ALSO HAS TONS OF GREAT METADATA 613 00:27:52,480 --> 00:27:55,080 WHICH ALLOWS FOR DETAIL 614 00:27:55,080 --> 00:27:56,080 FACETTING. 615 00:27:56,080 --> 00:27:59,200 SO, I'VE FILTERED THE GLOBAL 616 00:27:59,200 --> 00:28:00,320 HEALTH EVENTS COLLECTION TO 617 00:28:00,320 --> 00:28:03,520 FOCUS ON THE COVID-19 OUTBREAK 618 00:28:03,520 --> 00:28:06,200 LOOKING FOR ENGLISH LANGUAGE 619 00:28:06,200 --> 00:28:10,960 ARMS THAT REFERENCES VULNERABLE 620 00:28:10,960 --> 00:28:12,240 POPULATIONS' AND THIS BRINGS IT 621 00:28:12,240 --> 00:28:14,160 DOWN TO JUST OVER A THOUSAND 622 00:28:14,160 --> 00:28:14,600 RESULTS. 623 00:28:14,600 --> 00:28:16,600 YOU CAN ADD MORE FACETS OR 624 00:28:16,600 --> 00:28:18,360 PERFORM A FULL TEXT SEARCH ON 625 00:28:18,360 --> 00:28:19,880 THIS FILTERED DATA. 626 00:28:19,880 --> 00:28:22,800 WHAT I REFER TO AS THE COVID-19 627 00:28:22,800 --> 00:28:24,600 COLLECTION IS THE CONTENTS OF 628 00:28:24,600 --> 00:28:29,680 THIS GROUP LABELED CORONAVIRUS 629 00:28:29,680 --> 00:28:39,960 DISEASE COVID-19 OUTBREAK. 630 00:28:39,960 --> 00:28:43,560 I WANT TO POINT OUT A COUPLE OF 631 00:28:43,560 --> 00:28:47,080 NICE ARTICLES. 632 00:28:47,080 --> 00:28:50,480 IT'S ONE THING TO COLLECT A LOT 633 00:28:50,480 --> 00:28:52,120 OF WEB PAGES BUT IT'S ANOTHER 634 00:28:52,120 --> 00:28:54,840 EFFORT TO ENSURE THAT QUALITY 635 00:28:54,840 --> 00:28:57,080 METADATA IS APPLIED TO THE SEEDS 636 00:28:57,080 --> 00:29:01,200 SO THAT RESEARCHERS CAN FIND THE 637 00:29:01,200 --> 00:29:03,280 INFORMATION THAT THEY NEED. 638 00:29:03,280 --> 00:29:05,960 TO HIGHLIGHT THE IMPORTANCE 639 00:29:05,960 --> 00:29:09,160 LET'S LOOK AT A COUPLE OF THE 640 00:29:09,160 --> 00:29:10,080 SEEDS. 641 00:29:10,080 --> 00:29:18,000 THE NORTH CAR -- THE NEW MEXICOF 642 00:29:18,000 --> 00:29:21,440 HEALTH -- SHOWED A TOTAL OF 10 643 00:29:21,440 --> 00:29:23,480 POSITIVE CASES. 644 00:29:23,480 --> 00:29:27,360 BY SEPTEMBER 2020 THE THERE WERE 645 00:29:27,360 --> 00:29:29,360 OVER 27,000. 646 00:29:29,360 --> 00:29:35,320 ONE YEAR LATER IN SEPTEMBER 2021 647 00:29:35,320 --> 00:29:38,240 ALMOST 250,000 POSITIVE CASES. 648 00:29:38,240 --> 00:29:41,960 AND BY MARCH 15 TWO YEARS INTO 649 00:29:41,960 --> 00:29:43,640 THE PANDEMIC NEW MEXICO HAD OVER 650 00:29:43,640 --> 00:29:45,760 HALF A MILLION POSITIVE CASES. 651 00:29:45,760 --> 00:29:48,320 NOW AS AN A SIDE THIS MEMENTO 652 00:29:48,320 --> 00:29:49,520 SHOWS SOMETHING THAT YOU'LL FIND 653 00:29:49,520 --> 00:29:51,480 IF YOU SPEND ENOUGH TIME 654 00:29:51,480 --> 00:29:53,720 EXPLORING WEB ARCHIVES. 655 00:29:53,720 --> 00:29:56,280 THERE WERE FOUR CSS STYLE FILES 656 00:29:56,280 --> 00:29:58,080 THAT WERE USED TO BUILD THIS 657 00:29:58,080 --> 00:29:59,840 PAGE THAT HAPPENED TO NOT BE 658 00:29:59,840 --> 00:30:01,520 ARCHIVED SO THE REPLAY OF THE 659 00:30:01,520 --> 00:30:03,160 PAGE IS AFFECTED. 660 00:30:03,160 --> 00:30:04,720 THE INFORMATION IS STILL THERE 661 00:30:04,720 --> 00:30:07,640 BUT IT DOES NOT LOOK LIKE WHAT 662 00:30:07,640 --> 00:30:11,480 APPEARS ON THE LIVE WEB. 663 00:30:11,480 --> 00:30:13,520 AND NOW TODAY NEW MEXICO'S 664 00:30:13,520 --> 00:30:15,400 DEPARTMENT OF HEALTH COVID PAGE 665 00:30:15,400 --> 00:30:17,520 HAS TRANSITIONED AWAY FROM 666 00:30:17,520 --> 00:30:19,280 REPORTING POSITIVE CASE NUMBERS. 667 00:30:19,280 --> 00:30:21,920 THEY ARE STILL HERE IN THIS 668 00:30:21,920 --> 00:30:23,600 BANNER UP AT THE TOP BUT THEY 669 00:30:23,600 --> 00:30:24,960 ARE REALLY SMALL. 670 00:30:24,960 --> 00:30:27,200 BY LOOKING AT OTHER MOMENTOS 671 00:30:27,200 --> 00:30:29,000 DURING THIS TIME PERIOD I FOUND 672 00:30:29,000 --> 00:30:31,120 THAT THIS REDESIGN HAPPENED 673 00:30:31,120 --> 00:30:34,840 BETWEEN MARCH AND MAY 2022. 674 00:30:34,840 --> 00:30:39,960 THIS WEB PAGE WAS CAPTURED IN 675 00:30:39,960 --> 00:30:45,080 NNLM's -- YOU WOULD HAVE TO GO 676 00:30:45,080 --> 00:30:47,320 TO THE CALENDAR INTERFACE AND 677 00:30:47,320 --> 00:30:51,760 SELECT THE YEAR AND HOVER ON THE 678 00:30:51,760 --> 00:30:52,320 DATE. 679 00:30:52,320 --> 00:30:54,960 AND FOR A LARGE SET OF MOMENTOS 680 00:30:54,960 --> 00:30:58,160 THIS CAN GET TEDIOUS. 681 00:30:58,160 --> 00:31:01,400 SO A FEW YEARS AGO WE HAD A 682 00:31:01,400 --> 00:31:03,840 FUNDED PROJECT TO VISUALIZE HOW 683 00:31:03,840 --> 00:31:05,200 INDIVIDUAL WEB PAGES CHANGE OVER 684 00:31:05,200 --> 00:31:05,640 TIME. 685 00:31:05,640 --> 00:31:09,000 IN THIS TOOL A USER COULD SUBMIT 686 00:31:09,000 --> 00:31:12,160 A URL AND BE PROVIDED OF SCREEN 687 00:31:12,160 --> 00:31:14,080 SHOTS OF SELECTED MOMENTOS 688 00:31:14,080 --> 00:31:15,720 THROUGH TIME. 689 00:31:15,720 --> 00:31:17,680 FOR SOME SEEDS THERE ARE VERY 690 00:31:17,680 --> 00:31:20,080 LARGE NUMBER OF MOMENTOS TOO 691 00:31:20,080 --> 00:31:22,360 MANY TO DISPLAY AT ONCE SO WE 692 00:31:22,360 --> 00:31:24,120 USE THE AMOUNT OF DIFFERENCE IN 693 00:31:24,120 --> 00:31:28,560 THE HTML SOURCE AS A PROXY FOR 694 00:31:28,560 --> 00:31:29,800 LARGE CHANGES IN THE WEB PAGE. 695 00:31:29,800 --> 00:31:31,800 AND THEN WE RENDERED THOSE 696 00:31:31,800 --> 00:31:34,000 UNIQUE MOMENTOS AND TOOK SCREEN 697 00:31:34,000 --> 00:31:35,880 SHOTS FOR THE DISPLAY. 698 00:31:35,880 --> 00:31:39,000 WE EXPERIMENTED WITH A COUPLE OF 699 00:31:39,000 --> 00:31:40,760 DIFFERENT DISPLAY TYPES AND I 700 00:31:40,760 --> 00:31:42,440 FOUND THE GRID HERE TO BE USEFUL 701 00:31:42,440 --> 00:31:45,040 FOR AIR LARGE SCALE OVERVIEW AND 702 00:31:45,040 --> 00:31:49,440 THE ANIMATED GIF THAT CAN BE 703 00:31:49,440 --> 00:31:51,680 INSERTED IN WEB PAGES OR EVEN IN 704 00:31:51,680 --> 00:31:55,280 SLIDES AS I'VE DONE HERE. 705 00:31:55,280 --> 00:31:58,600 SO HERE IS ANOTHER MEMENTO FROM 706 00:31:58,600 --> 00:31:59,400 THE COVID COLLECTION. 707 00:31:59,400 --> 00:32:02,560 THIS IS AN ARTICLE ABOUT VACCINE 708 00:32:02,560 --> 00:32:06,200 HESITANCY IN AFRICAN-AMERICAN 709 00:32:06,200 --> 00:32:07,800 COMMUNITIES FROM A LOCAL NEWS 710 00:32:07,800 --> 00:32:09,600 OUTLET IN MICHIGAN. 711 00:32:09,600 --> 00:32:12,120 PUBLISHED IN JULY 2020 BEFORE 712 00:32:12,120 --> 00:32:13,400 VACCINES WERE AVAILABLE AND IT 713 00:32:13,400 --> 00:32:15,160 PROVIDES US WITH A SNAPSHOT IN 714 00:32:15,160 --> 00:32:16,640 TIME OF WHAT PEOPLE WERE 715 00:32:16,640 --> 00:32:17,800 CONCERNED ABOUT. 716 00:32:17,800 --> 00:32:19,680 AND THERE ARE A COUPLE THINGS 717 00:32:19,680 --> 00:32:21,880 THAT I WANT TO NOTE ABOUT THIS 718 00:32:21,880 --> 00:32:22,560 MEMENTO. 719 00:32:22,560 --> 00:32:27,640 IT WAS PUBLISHED IN JULY BUT 720 00:32:27,640 --> 00:32:28,760 ARCHIVE COLLECTION IN SEPTEMBER 721 00:32:28,760 --> 00:32:31,760 SO CAPTURE DATE DOES NOT MEAN 722 00:32:31,760 --> 00:32:32,560 PUBLISH DATE. 723 00:32:32,560 --> 00:32:34,560 IT'S RECORDING THAT THE PAGE WAS 724 00:32:34,560 --> 00:32:36,360 STILL LIVE IN SEPTEMBER. 725 00:32:36,360 --> 00:32:37,880 SECOND, ONE NICE THING ABOUT 726 00:32:37,880 --> 00:32:39,840 THIS ARTICLE THOUGH IT'S NOT 727 00:32:39,840 --> 00:32:41,800 SHOWN IN MY SCREEN SHOT IS THAT 728 00:32:41,800 --> 00:32:44,480 IT CONTAINS LINKS TO OUTSIDE 729 00:32:44,480 --> 00:32:45,320 RESOURCES. 730 00:32:45,320 --> 00:32:48,680 MANY OF WHICH ARE ALSO ARCHIVED 731 00:32:48,680 --> 00:32:50,120 BY THE COLLECTION. 732 00:32:50,120 --> 00:32:53,000 SO YOU CAN REVIEW SOURCES AS 733 00:32:53,000 --> 00:32:55,480 THEY EXISTED WHEN THE CAPTURE 734 00:32:55,480 --> 00:32:56,120 WAS MADE. 735 00:32:56,120 --> 00:32:58,800 THIS PAGE IS STILL ON THE LIVE 736 00:32:58,800 --> 00:33:01,760 WEB BUT WE ARE NOT GUARANTEED 737 00:33:01,760 --> 00:33:06,520 THAT IT ALWAYS WILL BE. 738 00:33:06,520 --> 00:33:09,120 THEY ALLOW US TO HOLD THIS 739 00:33:09,120 --> 00:33:10,760 SNAPSHOT IN TIME IN CASE THE 740 00:33:10,760 --> 00:33:13,760 LIVE WEB PAGE DISAPPEARS. 741 00:33:13,760 --> 00:33:16,040 AND HERE IS AN EXAMPLE OF THAT. 742 00:33:16,040 --> 00:33:18,080 THIS IS THE PAGE FROM THE SOUTH 743 00:33:18,080 --> 00:33:21,320 CAROLINA DEPARTMENT OF EDUCATION 744 00:33:21,320 --> 00:33:24,080 CAPTURED ON JANUARY 14, 2021. 745 00:33:24,080 --> 00:33:26,400 AND THIS WAS THE SOUTH CAROLINA 746 00:33:26,400 --> 00:33:29,200 SCHOOL REOPENING PLAN SUMMARY. 747 00:33:29,200 --> 00:33:34,520 TODAY THAT URL DEDICATION TO 748 00:33:34,520 --> 00:33:36,320 EDUCATION.com JUST REDIRECTS 749 00:33:36,320 --> 00:33:37,520 TO THE SOUTH CAROLINA DEPARTMENT 750 00:33:37,520 --> 00:33:39,960 OF EDUCATION'S COVID NEWSROOM. 751 00:33:39,960 --> 00:33:42,440 SO, ITS ONLY BECAUSE OF THE WEB 752 00:33:42,440 --> 00:33:44,200 ARCHIVE THAT WE KNOW WHAT THIS 753 00:33:44,200 --> 00:33:49,680 PAGE LOOKED LIKE. 754 00:33:49,680 --> 00:33:52,920 NOW THE NLM COVID COLLECTION 755 00:33:52,920 --> 00:33:56,160 CONTAINS MANY CAPTURES OF SOCIAL 756 00:33:56,160 --> 00:33:58,920 MEDIA DURING THE PANDEMIC. 757 00:33:58,920 --> 00:34:01,480 I FOUND THESE BY SEARCHING FOR 758 00:34:01,480 --> 00:34:04,760 THE TERM HASHTAG. 759 00:34:04,760 --> 00:34:08,960 SOME OF THESE ARE WHAT YOU WOULD 760 00:34:08,960 --> 00:34:09,200 EXPECT. 761 00:34:09,200 --> 00:34:11,320 LIKE COVID-19. 762 00:34:11,320 --> 00:34:14,160 MY COVID VACCINE AND WEAR A MASK 763 00:34:14,160 --> 00:34:16,160 BUT OTHERS ARE TERMS THAT YOU 764 00:34:16,160 --> 00:34:17,800 MIGHT ONLY KNOW BECAUSE YOU 765 00:34:17,800 --> 00:34:19,280 LIVED THROUGH THIS TIME LIKE 766 00:34:19,280 --> 00:34:21,080 FLATTEN THE CURVE. 767 00:34:21,080 --> 00:34:23,640 I STAYED HOME FOR. 768 00:34:23,640 --> 00:34:27,640 AND SONGS OF COMFORT. 769 00:34:27,640 --> 00:34:30,520 SO THERE WAS THE I STAYED HOME 770 00:34:30,520 --> 00:34:34,240 FOR A CHALLENGE WHICH BROUGHT 771 00:34:34,240 --> 00:34:42,120 CELEBRITIES LIKE DOLLY PARTON. 772 00:34:42,120 --> 00:34:45,800 AND IN MARCH THE SONGS OF 773 00:34:45,800 --> 00:34:47,880 COMFORT HASHTAG. 774 00:34:47,880 --> 00:34:49,520 PLAYING FROM HIS HOME. 775 00:34:49,520 --> 00:34:52,160 BOTH OF THESE EXAMPLES ARE A 776 00:34:52,160 --> 00:34:53,760 REMINDER OF THE GENERAL FEELING 777 00:34:53,760 --> 00:34:58,400 IN MARCH AND APRIL OF 2020 OF 778 00:34:58,400 --> 00:35:04,000 UNITY. 779 00:35:04,000 --> 00:35:07,240 -- YOU WILL JUST SEE A STILL 780 00:35:07,240 --> 00:35:09,520 IMAGE AND THE VIDEO DOESN'T 781 00:35:09,520 --> 00:35:10,520 PLAY. 782 00:35:10,520 --> 00:35:12,720 MEDIA LIKE VIDEOS ON SOCIAL 783 00:35:12,720 --> 00:35:15,120 MEDIA ARE TRICE TO CAPTURE SO 784 00:35:15,120 --> 00:35:17,040 SOMETIMES THE CRAWLER WON'T HAVE 785 00:35:17,040 --> 00:35:18,480 CAPTURED EVERYTHING AND FOR 786 00:35:18,480 --> 00:35:20,280 REPLAY FROM A COLLECTION IN 787 00:35:20,280 --> 00:35:23,160 ARCHIVE IT THE PLAYBACK ENGINE 788 00:35:23,160 --> 00:35:24,760 WILL ONLY REQUEST RESOURCE 789 00:35:24,760 --> 00:35:29,240 THAT'S WERE CAPTURED INSIDE THAT 790 00:35:29,240 --> 00:35:29,680 COLLECTION. 791 00:35:29,680 --> 00:35:32,240 HOWEVER THEY ARE INCLUDED IN THE 792 00:35:32,240 --> 00:35:34,480 INTERNET ARCHIVES WAY BACK 793 00:35:34,480 --> 00:35:36,600 MACHINE AND REPLACE HAVE ACCESS 794 00:35:36,600 --> 00:35:39,240 TO EMBEDDED RESOURCES THAT WERE 795 00:35:39,240 --> 00:35:41,360 MAYBE PART OF OTHER CRAWLS. 796 00:35:41,360 --> 00:35:43,080 7 SO IF YOU JUST CHANGE A 797 00:35:43,080 --> 00:35:45,760 PORTION OF THE MEMENTO URL TO 798 00:35:45,760 --> 00:35:47,720 LOAD THE SAME MEMENTO THROUGH 799 00:35:47,720 --> 00:35:50,440 THE WAY BACK MACHINE SO YOU JUST 800 00:35:50,440 --> 00:35:54,560 REPLACE WAY BACK.ARCHIVE IT.ORG 801 00:35:54,560 --> 00:36:01,600 AND IN THIS INSTANCE 4887 WITH 802 00:36:01,600 --> 00:36:08,480 WEB.ARCHIVE.WEB. 803 00:36:08,480 --> 00:36:12,600 HERE THE VIDEO PLAYING THE SONG 804 00:36:12,600 --> 00:36:14,880 WAS CAPTURED LIKELY BY A 805 00:36:14,880 --> 00:36:16,280 DIFFERENT CRAWL FROM AROUND THE 806 00:36:16,280 --> 00:36:17,200 SAME TIME. 807 00:36:17,200 --> 00:36:19,200 AND SO SINCE THE WAY BACK 808 00:36:19,200 --> 00:36:21,520 MACHINE HAS ACCESS TO USE THE 809 00:36:21,520 --> 00:36:22,920 EMBEDDED RESOURCE IT CAN PLAY 810 00:36:22,920 --> 00:36:26,920 THE VIDEO. 811 00:36:26,920 --> 00:36:29,240 THE BROADER HASHTAGS THAT WE CAN 812 00:36:29,240 --> 00:36:31,480 SEE MAY HAVE MORE MEANING OVER A 813 00:36:31,480 --> 00:36:37,400 LONGER TIME PERIOD. 814 00:36:37,400 --> 00:36:39,760 #COVID-19 IS STILL BEING USED 815 00:36:39,760 --> 00:36:40,240 TODAY. 816 00:36:40,240 --> 00:36:44,400 ONE IS FOR MARCH 4, 2020 AND THE 817 00:36:44,400 --> 00:36:47,600 OTHER FROM MAY 3, 2022. 818 00:36:47,600 --> 00:36:50,680 AND BECAUSE THESE ARE ARCHIVED 819 00:36:50,680 --> 00:36:52,880 WEB PAGES WHEN YOU LOAD THEM YOU 820 00:36:52,880 --> 00:36:57,000 CAN SCROLL THE PAGES AND SEE 821 00:36:57,000 --> 00:36:58,480 MORE TWEETS. 822 00:36:58,480 --> 00:37:01,160 SO ONE THING TO NOTE IS BETWEEN 823 00:37:01,160 --> 00:37:03,120 THIS TIME Twitter CHANGED 824 00:37:03,120 --> 00:37:05,720 THEIR USER INTERFACE AND IN JUNE 825 00:37:05,720 --> 00:37:07,720 OF 2020 FORCED EVERYONE TO 826 00:37:07,720 --> 00:37:10,880 MIGRATE TO THEIR NEW USER 827 00:37:10,880 --> 00:37:12,600 INTERFACE WHICH LOOKED MORE LIKE 828 00:37:12,600 --> 00:37:15,920 THEIR MOBILE APP AND THIS CAUSED 829 00:37:15,920 --> 00:37:18,560 SOME ISSUES FOR WEB ARCHIVES. 830 00:37:18,560 --> 00:37:21,680 IF YOU LOOK THROUGH MOMENTOS 831 00:37:21,680 --> 00:37:23,320 FOR Twitter SEEDS IN THE WAY 832 00:37:23,320 --> 00:37:26,880 BACK MACHINE YOU'LL FIND BETWEEN 833 00:37:26,880 --> 00:37:29,520 JUNE 2020 AND SOME TIME IN 2021 834 00:37:29,520 --> 00:37:31,840 YOU WOULD FIND A LOT OF THESE 835 00:37:31,840 --> 00:37:33,920 SOMETHING WENT WRONG CAPTURES. 836 00:37:33,920 --> 00:37:36,160 AND DURING THIS TIME WE HAPPENED 837 00:37:36,160 --> 00:37:38,280 TO BE INVESTIGATING SOME 838 00:37:38,280 --> 00:37:40,280 ARCHIVED TWEETS AND NOTICED THIS 839 00:37:40,280 --> 00:37:40,960 ISSUE. 840 00:37:40,960 --> 00:37:44,400 AND IN OUR BLOG POST WE DETAILED 841 00:37:44,400 --> 00:37:46,240 SOME OF THE PROBLEMS THAT 842 00:37:46,240 --> 00:37:48,280 VARIOUS WEB ARCHIVES HAD WITH 843 00:37:48,280 --> 00:37:51,160 ARCHIVING THE NEW Twitter -- 844 00:37:51,160 --> 00:37:53,480 AND IDENTIFIED A WORK AROUND FOR 845 00:37:53,480 --> 00:37:55,200 SOME WEB ARCHIVES AND THIS WAS 846 00:37:55,200 --> 00:37:57,840 HAVING THE ARCHIVES WEB CRAWLERS 847 00:37:57,840 --> 00:38:00,720 PRETEND TO BE GOOGLE'S CRAWLER. 848 00:38:00,720 --> 00:38:02,280 BUT THIS POINTS TO SOME OF THE 849 00:38:02,280 --> 00:38:03,960 DIFFICULTIES THAT WEB ARCHIVES 850 00:38:03,960 --> 00:38:06,160 CAN ENCOUNTER WHEN TRYING TO 851 00:38:06,160 --> 00:38:07,680 ARCHIVE SOCIAL MEDIA. 852 00:38:07,680 --> 00:38:09,760 USER INTERFACE AND DELIVERY 853 00:38:09,760 --> 00:38:11,560 CHANGES ARE MADE FOR THE BENEFIT 854 00:38:11,560 --> 00:38:12,520 OF THE PLATFORM. 855 00:38:12,520 --> 00:38:15,720 AND NOT FOR THE BENEFIT OF OR 856 00:38:15,720 --> 00:38:17,160 EVEN WITHOUT REGARD FOR WEB 857 00:38:17,160 --> 00:38:17,760 ARCHIVES. 858 00:38:17,760 --> 00:38:20,960 AND THIS IS ISSUE HAS BECOME 859 00:38:20,960 --> 00:38:22,840 EVEN MORE IMPORTANT WITH THE 860 00:38:22,840 --> 00:38:24,040 TURBULENCE CURRENTLY HAPPENING 861 00:38:24,040 --> 00:38:27,000 AT Twitter. 862 00:38:27,000 --> 00:38:29,520 SO WE'VE LOOKED A LITTLE BIT AT 863 00:38:29,520 --> 00:38:31,680 THE COVID COLLECTION BUT AS YOU 864 00:38:31,680 --> 00:38:34,480 MIGHT SUSPECT THERE ARE MANY 865 00:38:34,480 --> 00:38:34,720 OTHERS. 866 00:38:34,720 --> 00:38:38,120 ARCHIVE IT RECENTLY PULLED 867 00:38:38,120 --> 00:38:40,920 TOGETHER 154 COVID COLLECTIONS 868 00:38:40,920 --> 00:38:42,280 INCLUDING THE GLOBAL HEALTH 869 00:38:42,280 --> 00:38:44,360 EVENTS WEB ARCHIVE INTO ONE 870 00:38:44,360 --> 00:38:47,800 PLACE AND THIS ALLOWS USERS TO 871 00:38:47,800 --> 00:38:49,280 SEARCH ACROSS VARIOUS 872 00:38:49,280 --> 00:38:51,560 COLLECTIONS AND ALTHOUGH THE 873 00:38:51,560 --> 00:38:55,040 INTERFACE LOOKS DIFFERENT IT'S A 874 00:38:55,040 --> 00:38:56,120 FRONT END TO THE SAME 875 00:38:56,120 --> 00:38:58,560 COLLECTIONS. 876 00:38:58,560 --> 00:39:00,520 SO WHAT IF A RESEARCHER WAS 877 00:39:00,520 --> 00:39:03,320 INTERESTED IN STUDYING SOME 878 00:39:03,320 --> 00:39:07,120 ASPECT OF COVID AND WANTED TO DO 879 00:39:07,120 --> 00:39:08,120 FURTHER ANALYSIS. 880 00:39:08,120 --> 00:39:10,040 WITH SO MANY DIFFERENT 881 00:39:10,040 --> 00:39:12,320 COLLECTIONS ALL ABOUT COVID HOW 882 00:39:12,320 --> 00:39:14,560 WOULD THEY DISCOVER WHAT TYPES 883 00:39:14,560 --> 00:39:17,160 OF PAGES ARE INSIDE EACH TO KNOW 884 00:39:17,160 --> 00:39:22,760 WHICH ONE THEY WANTED TO 885 00:39:22,760 --> 00:39:24,200 INVESTIGATE FURTHER. 886 00:39:24,200 --> 00:39:26,000 OUR APPROACH WAS TO PRODUCE A 887 00:39:26,000 --> 00:39:30,520 SOCIAL MEDIA STYLE STORY TO GIVE 888 00:39:30,520 --> 00:39:35,440 A HIGH LEVEL O -- OVERVIEW. 889 00:39:35,440 --> 00:39:38,240 A STORY AS AN ARRANGED SAMPLE 890 00:39:38,240 --> 00:39:42,960 FROM THAT COLLECTION. 891 00:39:42,960 --> 00:39:48,040 WE WERE INSPIRED BY THE STORE 892 00:39:48,040 --> 00:39:49,600 FEW SERVICE *. 893 00:39:49,600 --> 00:39:52,720 AND IT ALLOWED USERS TO CREATE 894 00:39:52,720 --> 00:39:54,760 AND CONSTRUCT STORIES BY 895 00:39:54,760 --> 00:39:57,280 COMBINING TEXT AND IMAGES AND 896 00:39:57,280 --> 00:39:59,040 PREVIEWS OF WEB PAGES AND SOCIAL 897 00:39:59,040 --> 00:40:01,040 MEDIA OF POSTS. 898 00:40:01,040 --> 00:40:03,040 BEFORE WE STARTED CREATING 899 00:40:03,040 --> 00:40:04,840 STORIES OURSELVES WE ANALYZED 900 00:40:04,840 --> 00:40:07,080 THE MOST POPULAR STORE FEW 901 00:40:07,080 --> 00:40:09,320 STORIES TO FIND OUT WHAT MAKES A 902 00:40:09,320 --> 00:40:10,000 GOOD STORY. 903 00:40:10,000 --> 00:40:13,320 WE FOUND THAT POPULAR STORIES 904 00:40:13,320 --> 00:40:16,200 HAD ABOUT * 28 ELEMENTS ON 905 00:40:16,200 --> 00:40:18,880 AVERAGE AND WE USED THIS AS A 906 00:40:18,880 --> 00:40:20,680 ROUGH BOUND ON THE STORIES THAT 907 00:40:20,680 --> 00:40:22,360 WE GENERATE. 908 00:40:22,360 --> 00:40:24,240 TO PROVIDE AN OVERVIEW OF A 909 00:40:24,240 --> 00:40:25,880 LARGE COLLECTION LIKE THIS ONE 910 00:40:25,880 --> 00:40:29,440 WITH OVER 23,000 MOMENTOS WE 911 00:40:29,440 --> 00:40:33,000 DEVELOPED A SUITE OF TOOLS THAT 912 00:40:33,000 --> 00:40:34,200 CAN TAKE AN AND KENTUCKY IT 913 00:40:34,200 --> 00:40:38,440 COLLECTION AND PRODUCE A STORE 914 00:40:38,440 --> 00:40:42,680 FIND STORY *. 915 00:40:42,680 --> 00:40:43,760 METADATA ABOUT THE COLLECTION 916 00:40:43,760 --> 00:40:46,800 AND A STRIKING IMAGE AND WE CALL 917 00:40:46,800 --> 00:40:49,320 THIS THE DARK AND STORMY 918 00:40:49,320 --> 00:40:52,320 ARCHIVES TOOLKIT WITH A NOD TO 919 00:40:52,320 --> 00:40:56,400 IT WAS A DARK AND STORMY NIGHT. 920 00:40:56,400 --> 00:40:58,120 THE GENERAL PROCESS THAT WE 921 00:40:58,120 --> 00:41:03,560 DEVELOPED FOR STORYTELLING IS TO 922 00:41:03,560 --> 00:41:05,920 SELECT MOMENTOS THAT CAN BE 923 00:41:05,920 --> 00:41:07,480 CONSIDERED AND THEN GENERATE THE 924 00:41:07,480 --> 00:41:09,720 STORY METADATA USING CONTENT 925 00:41:09,720 --> 00:41:11,240 FROM THE COLLECTION AND THE 926 00:41:11,240 --> 00:41:13,320 SELECTED MOMENTOS. 927 00:41:13,320 --> 00:41:15,680 GENERATE METADATA AND THEN 928 00:41:15,680 --> 00:41:16,880 VISUALIZE AND DISTRIBUTE THE 929 00:41:16,880 --> 00:41:19,200 STORY. 930 00:41:19,200 --> 00:41:21,120 WE SPLIT THESE FUNCTIONS INTO 931 00:41:21,120 --> 00:41:24,720 THREE DIFFERENT TOOLS. 932 00:41:24,720 --> 00:41:27,240 TO MAKE THE PROCESS MODULAR SO 933 00:41:27,240 --> 00:41:29,560 THAT THE TOOLS COULD BE USED FOR 934 00:41:29,560 --> 00:41:32,960 VARIOUS PURPOSES. 935 00:41:32,960 --> 00:41:35,000 SO HERE IS A CLOSE-UP OF THE 936 00:41:35,000 --> 00:41:36,680 GENERATED STORY THAT I SHOWED 937 00:41:36,680 --> 00:41:37,240 EARLIER. 938 00:41:37,240 --> 00:41:40,440 THIS WAS GENERATED IN APRIL 2020 939 00:41:40,440 --> 00:41:45,760 USING THE II PCs COVID-19 940 00:41:45,760 --> 00:41:46,680 COLLECTION. 941 00:41:46,680 --> 00:41:48,440 WHICH ALREADY HAD OVER 10,000 942 00:41:48,440 --> 00:41:52,040 SEEDS AND OVER 20,000 MOMENTOS. 943 00:41:52,040 --> 00:41:56,800 SO WE USED HYPER CANE TO ANALYZE 944 00:41:56,800 --> 00:42:01,120 AND IT SELECTED 36 MOMENTOS. 945 00:42:01,120 --> 00:42:04,840 FROM THOSE HYPER CANE SHOWS A 946 00:42:04,840 --> 00:42:07,200 STRIKING IMAGE AND THEN FROM THE 947 00:42:07,200 --> 00:42:09,400 CONTENT IT COMPUTED THE MOST 948 00:42:09,400 --> 00:42:13,280 FREQUENT ENTITIES AND THE MOST 949 00:42:13,280 --> 00:42:15,040 FREQUENT PHRASES AND THEN WE 950 00:42:15,040 --> 00:42:19,680 USED MEMENTO IN BED TO GENERATE 951 00:42:19,680 --> 00:42:22,400 A SOCIAL CARD FOR EACH AND THEN 952 00:42:22,400 --> 00:42:24,400 THE STORY WEB PAGE WAS BUILT 953 00:42:24,400 --> 00:42:27,640 USING RAIN TAIL USING THE INPUTS 954 00:42:27,640 --> 00:42:29,920 FROM THOSE PREVIOUS TOOLS. 955 00:42:29,920 --> 00:42:32,720 SO IN SELECTING MOMENTOS FROM AN 956 00:42:32,720 --> 00:42:35,720 ARCHIVE IT COLLECTION HYPER CAIN 957 00:42:35,720 --> 00:42:37,640 RELIES ON THE LIBRARY THAT WE 958 00:42:37,640 --> 00:42:38,720 DEVELOPED. 959 00:42:38,720 --> 00:42:41,520 THIS IS A PYTHON LIBRARY THAT 960 00:42:41,520 --> 00:42:44,560 CONTAINS THE METADATA AND SEED 961 00:42:44,560 --> 00:42:50,560 LIST AND SEED LIST METADATA. 962 00:42:50,560 --> 00:42:54,760 HYPER CANE PROVIDES A SET OF 963 00:42:54,760 --> 00:43:00,800 OPERATORS * FOR SELECTING THE 964 00:43:00,800 --> 00:43:04,680 EXEMPLARS FROM A COLLECTION. 965 00:43:04,680 --> 00:43:09,720 * NOW THE SAMPLE OPERATOR 966 00:43:09,720 --> 00:43:13,360 DIRECTLY PRODUCES A SAMPLE BASED 967 00:43:13,360 --> 00:43:15,600 ON EXHIBITING ALGORITHMS BUT 968 00:43:15,600 --> 00:43:18,000 THEY CAN BE COMPOSED USING OTHER 969 00:43:18,000 --> 00:43:19,600 OPERATORS LIKE FILTER AND 970 00:43:19,600 --> 00:43:20,560 CLUSTER. 971 00:43:20,560 --> 00:43:23,280 AND IN ADDITION TO USING AN 972 00:43:23,280 --> 00:43:27,080 ALGORITHMS FOR SAMPLING * 973 00:43:27,080 --> 00:43:28,880 MOMENTOS THEY CAN TAKE A 974 00:43:28,880 --> 00:43:32,200 FILTERED LIST AS INPUT AND THEN 975 00:43:32,200 --> 00:43:34,400 PRODUCE OUTPUT SUITABLE FOR USE 976 00:43:34,400 --> 00:43:37,520 IN RAIN TAIL AND IN THIS WAY THE 977 00:43:37,520 --> 00:43:39,920 DSA TOOLKIT CAN BE USED TO 978 00:43:39,920 --> 00:43:42,360 CREATE A HUMAN EXPERT CURATED 979 00:43:42,360 --> 00:43:45,800 STORY. 980 00:43:45,800 --> 00:43:48,040 SO DURING THIS PROCESS WHERE WE 981 00:43:48,040 --> 00:43:49,400 WERE TRYING TO BUILD THESE 982 00:43:49,400 --> 00:43:50,840 SUMMARIES OF COLLECTIONS WE 983 00:43:50,840 --> 00:43:53,480 FOUND THAT THE STANDARD SOCIAL 984 00:43:53,480 --> 00:43:57,240 CARD CREATORS LIKE WHAT TWITTER 985 00:43:57,240 --> 00:43:59,840 DOES DON'T HANDLE MOMENTOS WELL. 986 00:43:59,840 --> 00:44:02,880 SO WE BUILT THE TOOL TO MAKE 987 00:44:02,880 --> 00:44:05,760 SURE THE ORIGINAL SOURCE AND THE 988 00:44:05,760 --> 00:44:08,040 ARCHIVE WERE NOTED ON THE CARD. 989 00:44:08,040 --> 00:44:12,160 MEMENTO IN BED TAKES THE URL AND 990 00:44:12,160 --> 00:44:13,960 THEN GENERATES A CARD THAT CAN 991 00:44:13,960 --> 00:44:16,960 BE EMBEDDED IN A WEB PAGE SO IT 992 00:44:16,960 --> 00:44:19,280 EXTRACTS THE TITLE OF THE WEB 993 00:44:19,280 --> 00:44:22,480 PAGE, A STRIKING IMAGE AND A 994 00:44:22,480 --> 00:44:24,800 SNIPPET FROM THE MEMENTO. 995 00:44:24,800 --> 00:44:26,040 AND IT INCLUDES INFORMATION 996 00:44:26,040 --> 00:44:29,280 ABOUT THE ARCHIVE THAT IT CAME 997 00:44:29,280 --> 00:44:30,400 FROM. 998 00:44:30,400 --> 00:44:32,000 AND SPECIFIES THE ORIGINAL 999 00:44:32,000 --> 00:44:35,520 SOURCE ALONG WITH THE DAYTIME 1000 00:44:35,520 --> 00:44:37,880 THAT THE MEMENTO WAS CAPTURED 1001 00:44:37,880 --> 00:44:40,960 AND IT INCLUDES OTHER LINKS TO 1002 00:44:40,960 --> 00:44:43,120 OTHER MOMENTOS OF THAT SAME WEB 1003 00:44:43,120 --> 00:44:43,320 PAGE. 1004 00:44:43,320 --> 00:44:48,680 AND ALSO A LINK TO THE LIVE AND 1005 00:44:48,680 --> 00:44:50,280 CAN ALSO INDICATE IF THE PAGE IS 1006 00:44:50,280 --> 00:44:51,760 NO LONGER AVAILABLE ON THE LIVE 1007 00:44:51,760 --> 00:44:52,320 WEB. 1008 00:44:52,320 --> 00:44:59,680 SO YOU CAN TRY THIS YOURSELF. 1009 00:44:59,680 --> 00:45:02,360 YOU PROVIDE A URL OF THE MEMENTO 1010 00:45:02,360 --> 00:45:07,000 OR AS WE CALL IT A URILM. 1011 00:45:07,000 --> 00:45:09,280 AND IT WILL GENERATE A SOCIAL 1012 00:45:09,280 --> 00:45:12,400 CARD THAT YOU CAN EMBED INTO A 1013 00:45:12,400 --> 00:45:13,640 WEB PAGE. 1014 00:45:13,640 --> 00:45:16,200 RAIN TALE WAS THE FINAL PIECE 1015 00:45:16,200 --> 00:45:18,280 FOR CREATING THIS STORY. 1016 00:45:18,280 --> 00:45:20,840 IT TAKES INPUT AND ALLOWS THE 1017 00:45:20,840 --> 00:45:23,760 USE OF A TEMPLATE TO ARRANGE THE 1018 00:45:23,760 --> 00:45:25,360 METADATA AND SOCIAL CARDS INTO A 1019 00:45:25,360 --> 00:45:26,800 STORY AND THESE ARE THREE 1020 00:45:26,800 --> 00:45:29,200 EXAMPLES OF DIFFERENT TYPES OF 1021 00:45:29,200 --> 00:45:31,200 STORIES THAT CAN BE CREATED. 1022 00:45:31,200 --> 00:45:34,560 YOU CAN PRODUCE AN HTML WEB 1023 00:45:34,560 --> 00:45:37,120 PAGE, OUTPUT AND MARK DOWN OR AS 1024 00:45:37,120 --> 00:45:40,440 A Twitter THREAD AND THE WEB 1025 00:45:40,440 --> 00:45:41,880 PAGE SHOWN HERE WAS GENERATED 1026 00:45:41,880 --> 00:45:44,200 WITH A TEMPLATE BUILT FOR RAIN 1027 00:45:44,200 --> 00:45:46,520 TALE BY THE NATIONAL LIBRARY OF 1028 00:45:46,520 --> 00:45:48,000 AUSTRALIA WHO WE WORKED WITH 1029 00:45:48,000 --> 00:45:50,520 LAST YEAR TO DEVELOP SOME WEB 1030 00:45:50,520 --> 00:45:53,320 PAGED USER INTERFACES FOR THIS 1031 00:45:53,320 --> 00:45:53,880 WEB CHAIN. 1032 00:45:53,880 --> 00:45:56,000 AS WE'VE BEEN DEVELOPING THE 1033 00:45:56,000 --> 00:45:58,160 TOOLS AND ADDING FEATURES WE 1034 00:45:58,160 --> 00:46:01,360 COLLECTED THE STORIES ON TO THE 1035 00:46:01,360 --> 00:46:04,400 DSA PUDDLES WEBSITE AND THE URL 1036 00:46:04,400 --> 00:46:05,400 IS DOWN HERE. 1037 00:46:05,400 --> 00:46:07,280 SO MOST OF THE STORIES YOU'LL 1038 00:46:07,280 --> 00:46:09,320 SEE ARE NEWS RELATED BECAUSE WE 1039 00:46:09,320 --> 00:46:10,800 HAVE A SEPARATE PROCESS THAT 1040 00:46:10,800 --> 00:46:13,680 GENERATES A STORY FOR THE 1041 00:46:13,680 --> 00:46:15,160 BIGGEST NEWS STORY EACH DAY AND 1042 00:46:15,160 --> 00:46:19,200 THIS IS BASED ON ANOTHER TOOL 1043 00:46:19,200 --> 00:46:21,200 AND IT MONITORS ARTICLES FROM A 1044 00:46:21,200 --> 00:46:23,640 RANGE OF NEWS SITES. 1045 00:46:23,640 --> 00:46:25,080 IF YOU SCROLL DOWN TO THE BOTTOM 1046 00:46:25,080 --> 00:46:28,000 OF THE DSA POST PAGE YOU CAN 1047 00:46:28,000 --> 00:46:31,760 SELECT THE CATEGORY OF STORIES 1048 00:46:31,760 --> 00:46:39,320 TO VIEW. 1049 00:46:39,320 --> 00:46:41,280 HERE ARE SOME OF THE STORIES 1050 00:46:41,280 --> 00:46:44,080 GENERATED BY ALGORITHMS AND THIS 1051 00:46:44,080 --> 00:46:46,640 INCLUDES STORIES GENERATED BASED 1052 00:46:46,640 --> 00:46:51,600 ON THE ARCHIVE IT COLLECTIONS. 1053 00:46:51,600 --> 00:46:53,480 HERE ARE THREE OF THE STORIES 1054 00:46:53,480 --> 00:46:56,720 THAT WE GENERATED FROM NLM WEB 1055 00:46:56,720 --> 00:46:57,680 ARCHIVE COLLECTIONS. 1056 00:46:57,680 --> 00:47:03,080 WE APPLIED THE DSA TOOLKIT TO 1057 00:47:03,080 --> 00:47:06,920 THE OPIOID ARCHIVE. 1058 00:47:06,920 --> 00:47:10,600 AND THE ENVIRONMENTAL HEALTH 1059 00:47:10,600 --> 00:47:13,440 COLLECTION AND EACH WAS 1060 00:47:13,440 --> 00:47:14,840 GENERATED WITH THE GOAL OF 1061 00:47:14,840 --> 00:47:17,360 SELECTING MOMENTOS ON DIFFERENT 1062 00:47:17,360 --> 00:47:20,280 TOPICS FROM A RANGE OF DAYTIMES 1063 00:47:20,280 --> 00:47:22,960 CROSS THE COLLECTION AND * AND 1064 00:47:22,960 --> 00:47:27,560 THE LINKS WILL TAKE YOU TO THE 1065 00:47:27,560 --> 00:47:28,600 FULL STORY. 1066 00:47:28,600 --> 00:47:30,720 AGAIN THE GOAL OF THE DSA 1067 00:47:30,720 --> 00:47:32,720 STORIES IS TO PROVIDE A QUICK 1068 00:47:32,720 --> 00:47:34,840 OVERVIEW OF THE CONTENTS OF A 1069 00:47:34,840 --> 00:47:36,200 COLLECTION SO THAT USERS CAN 1070 00:47:36,200 --> 00:47:38,160 BROWSE THE STORIES TO DETERMINE 1071 00:47:38,160 --> 00:47:39,320 WHICH COLLECTION THEY MIGHT WANT 1072 00:47:39,320 --> 00:47:43,800 TO EXPLORE FURTHER. 1073 00:47:43,800 --> 00:47:46,880 SO OUR DSA TOOLKIT IS ONE 1074 00:47:46,880 --> 00:47:48,360 APPROACH FOR COLLECTION OWNERS 1075 00:47:48,360 --> 00:47:50,920 TO HIGHLIGHT SOME OF WHAT IS 1076 00:47:50,920 --> 00:47:51,520 CONTAINED. 1077 00:47:51,520 --> 00:47:55,880 BUT IT IS NOT THE ONLY WAY. 1078 00:47:55,880 --> 00:47:58,040 THE COVID-19 ARCHIVE THAT I 1079 00:47:58,040 --> 00:48:00,560 SHOWED EARLIER WHICH COMBINES 1080 00:48:00,560 --> 00:48:02,280 THE 150 PLUS COVID COLLECTIONS 1081 00:48:02,280 --> 00:48:05,680 AT ARCHIVE IT HAS A FEATURED 1082 00:48:05,680 --> 00:48:07,360 COLLECTIONS AREA THAT GROUPS 1083 00:48:07,360 --> 00:48:08,800 RELATED COLLECTIONS TOGETHER FOR 1084 00:48:08,800 --> 00:48:11,360 EASIER DISCOVERY. 1085 00:48:11,360 --> 00:48:13,120 THE INTERNET ARCHIVE HAS BEEN 1086 00:48:13,120 --> 00:48:15,760 WORKING ON GATHERING TOPICAL 1087 00:48:15,760 --> 00:48:17,600 COLLECTIONS OF ITS HOLDINGS AS 1088 00:48:17,600 --> 00:48:18,160 WELL. 1089 00:48:18,160 --> 00:48:20,320 IN CONJUNCTION WITH THE RECENT 1090 00:48:20,320 --> 00:48:22,880 RELEASE OF DEMOCRACY'S LIBRARY 1091 00:48:22,880 --> 00:48:23,960 THEY PULLED TOGETHER THE 1092 00:48:23,960 --> 00:48:25,680 GOVERNMENT WEBSITES OF THE WORLD 1093 00:48:25,680 --> 00:48:27,400 WHERE THEY FILTERED AND GROUPED 1094 00:48:27,400 --> 00:48:29,280 THEIR HOLDINGS BY GOVERNMENT 1095 00:48:29,280 --> 00:48:33,040 COUNTRY CODE. 1096 00:48:33,040 --> 00:48:35,280 ALSO PROVIDING A KEY WORD SEARCH 1097 00:48:35,280 --> 00:48:36,920 FOR ABOUT A DOZEN COLLECTIONS 1098 00:48:36,920 --> 00:48:38,360 AND THIS IS AVAILABLE FROM THE 1099 00:48:38,360 --> 00:48:40,840 MAIN WAY BACK MACHINE WEB PAGE. 1100 00:48:40,840 --> 00:48:46,120 THESE COLLECTIONS INCLUDE PDFs 1101 00:48:46,120 --> 00:48:47,800 FROM.GOV WEBSITES. 1102 00:48:47,800 --> 00:48:50,680 PowerPoint PRESENTATIONS AND 1103 00:48:50,680 --> 00:48:52,800 TWO OF THE REGULAR PRESIDENTIAL 1104 00:48:52,800 --> 00:48:53,400 CRAWLS. 1105 00:48:53,400 --> 00:48:55,360 ALL ARE HELPING WITH DISCOVERY 1106 00:48:55,360 --> 00:48:57,000 BUT WE NEED MORE. 1107 00:48:57,000 --> 00:48:59,840 ONE TOOL OR ONE APPROACH WON'T 1108 00:48:59,840 --> 00:49:03,560 FIT EVERY SITUATION. 1109 00:49:03,560 --> 00:49:06,080 ANOTHER CHALLENGE IS THAT ONCE A 1110 00:49:06,080 --> 00:49:07,520 RESEARCHER IDENTIFIES A 1111 00:49:07,520 --> 00:49:09,320 COLLECTION HOW CAN THEY ACTUALLY 1112 00:49:09,320 --> 00:49:10,600 WORK WITH THE CONTENT? 1113 00:49:10,600 --> 00:49:13,560 SO THE ARCHIVES PROJECT HAS BEEN 1114 00:49:13,560 --> 00:49:15,400 DEVELOPING TOOLS FOR ANALYSIS OF 1115 00:49:15,400 --> 00:49:18,600 THE CONTENT FOR SEVERAL YEARS. 1116 00:49:18,600 --> 00:49:20,720 JUST TO NOTE THIS IS NOT OUR 1117 00:49:20,720 --> 00:49:22,200 GROUP'S WORK BUT I HAVE SUNDAY 1118 00:49:22,200 --> 00:49:24,560 OF ON THIS PROJECT'S ADVISORY 1119 00:49:24,560 --> 00:49:25,120 BOARD. 1120 00:49:25,120 --> 00:49:27,720 THE TOOLKIT IS A SET OF TOOLS 1121 00:49:27,720 --> 00:49:31,440 LITTLE BIT ON APACHE SPARK TO 1122 00:49:31,440 --> 00:49:33,440 GENERATE DERIVATIVES. 1123 00:49:33,440 --> 00:49:35,440 AND THIS CAN INCLUDE THINGS LIKE 1124 00:49:35,440 --> 00:49:37,560 CRAWL FREQUENCY, HOW MANY TIMES 1125 00:49:37,560 --> 00:49:39,720 PARTICULAR PAGES WERE CRAWLED. 1126 00:49:39,720 --> 00:49:42,640 LINK DIAGRAMS WHICH VISUALIZE 1127 00:49:42,640 --> 00:49:44,040 WHICH PAGES LINK TO OTHER PAGES 1128 00:49:44,040 --> 00:49:45,360 IN THE COLLECTION. 1129 00:49:45,360 --> 00:49:48,120 DOMAIN FREQUENCY. 1130 00:49:48,120 --> 00:49:50,120 WHAT ARE THE DOMAINS THAT ARE 1131 00:49:50,120 --> 00:49:51,640 CONTAINED IN THE COLLECTION. 1132 00:49:51,640 --> 00:49:54,760 NAMED ENTITY EXTRACTIONS AND 1133 00:49:54,760 --> 00:49:56,960 FULL TEXT EXTRACTIONS BUT IT 1134 00:49:56,960 --> 00:49:57,640 REQUIRES PROGRAMMING KNOWLEDGE 1135 00:49:57,640 --> 00:49:59,640 AND THAT THE COLLECTIONS WORK 1136 00:49:59,640 --> 00:50:01,800 FILES BE LOCAL TO WHERE THE 1137 00:50:01,800 --> 00:50:03,640 PROGRAM IS RUN SO THE USER MUST 1138 00:50:03,640 --> 00:50:05,880 HAVE ENOUGH STORAGE TO BE ABLE 1139 00:50:05,880 --> 00:50:07,240 TO DOWNLOAD THE COLLECTION. 1140 00:50:07,240 --> 00:50:09,320 HOWEVER IN CONJUNCTION WITH THE 1141 00:50:09,320 --> 00:50:12,080 INTERNET ARCHIVE THE ARCHIVES 1142 00:50:12,080 --> 00:50:15,120 UNLEASHED TEAM IS DEVELOPING 1143 00:50:15,120 --> 00:50:16,440 ARCH. 1144 00:50:16,440 --> 00:50:18,480 AND THIS WOULD ALLOW THE 1145 00:50:18,480 --> 00:50:20,840 GENERATION OF THESE DERIVATIVES 1146 00:50:20,840 --> 00:50:23,360 TO BE PERFORMED AT THE INTEREST 1147 00:50:23,360 --> 00:50:25,720 NIGHT ARCHIVE. 1148 00:50:25,720 --> 00:50:27,480 AND THEN USERS COULD THEN 1149 00:50:27,480 --> 00:50:29,480 DOWNLOAD THE MUCH SMALLER 1150 00:50:29,480 --> 00:50:32,600 DERIVATIVES FOR USE IN THEIR OWN 1151 00:50:32,600 --> 00:50:36,440 ANALYSIS TOOLS. 1152 00:50:36,440 --> 00:50:39,880 BOTH ARCHIVES UNLEASHED AND THE 1153 00:50:39,880 --> 00:50:41,800 PROJECT HAVE SEVERAL EXAMPLES 1154 00:50:41,800 --> 00:50:43,440 THAT CAN BE DONE WITH WEB 1155 00:50:43,440 --> 00:50:44,040 ARCHIVES. 1156 00:50:44,040 --> 00:50:45,880 WHAT PEOPLE CAN DO WITH THE 1157 00:50:45,880 --> 00:50:48,240 DERIVATIVES THAT THEY'VE 1158 00:50:48,240 --> 00:50:48,560 DOWNLOADED. 1159 00:50:48,560 --> 00:50:50,320 SO I WANTED TO INCLUDE A COUPLE 1160 00:50:50,320 --> 00:50:53,880 OF LINKS TO RECAPS AND VIDEOS OF 1161 00:50:53,880 --> 00:50:55,200 PRESENTATIONS THAT RESEARCHERS 1162 00:50:55,200 --> 00:50:56,880 HAVE MADE ABOUT BOTH OF THESE 1163 00:50:56,880 --> 00:50:58,400 PROJECTS AND THIS IS PART OF THE 1164 00:50:58,400 --> 00:51:00,840 INTERNET ARCHIVES LIBRARY AS 1165 00:51:00,840 --> 00:51:01,320 LABORATORY PROGRAM. 1166 00:51:01,320 --> 00:51:03,320 SO IF YOU'RE INTERESTED IN SOME 1167 00:51:03,320 --> 00:51:04,720 OF THE WAYS THAT RESEARCHERS 1168 00:51:04,720 --> 00:51:07,800 HAVE BEEN USING AND ANALYZING 1169 00:51:07,800 --> 00:51:09,960 DATA I WOULD ENCOURAGE YOU TO 1170 00:51:09,960 --> 00:51:13,840 CHECK THESE OUT. 1171 00:51:13,840 --> 00:51:16,120 SO THE TOOLS I JUST MENTIONED 1172 00:51:16,120 --> 00:51:18,040 ALLOW USERS TO WORK ON THE DATA 1173 00:51:18,040 --> 00:51:20,680 FROM A COLLECTION IN AGGREGATE. 1174 00:51:20,680 --> 00:51:23,760 BUT WHAT IF YOU WANTED TO 1175 00:51:23,760 --> 00:51:26,120 IDENTIFY FEATURES IN PARTICULAR 1176 00:51:26,120 --> 00:51:29,240 MOMENTOS OR FIND THOSE THAT HAVE 1177 00:51:29,240 --> 00:51:31,880 CERTAIN QUALITIES. 1178 00:51:31,880 --> 00:51:35,200 HOW CAN WE HELP USERS FIND 1179 00:51:35,200 --> 00:51:37,360 MOMENTOS WITH THIS SPECIFIC 1180 00:51:37,360 --> 00:51:40,200 CRITERIA WHEN THE EXACT DATE 1181 00:51:40,200 --> 00:51:44,440 TIME OR THE EXACT URL IS NOT 1182 00:51:44,440 --> 00:51:49,800 KNOWN. 1183 00:51:49,800 --> 00:51:52,160 THEY CLASSIFIED THE USER TASKS 1184 00:51:52,160 --> 00:51:53,280 OF JOURNALISTS. 1185 00:51:53,280 --> 00:51:55,280 THE MOST COMMON WAS TO FIND A 1186 00:51:55,280 --> 00:52:00,360 PAGE THAT HAD BEEN DELETED FROM 1187 00:52:00,360 --> 00:52:07,760 THE LIVE WEB. 1188 00:52:07,760 --> 00:52:12,760 AND ALTHOUGH THESE ARE DIFFERENT 1189 00:52:12,760 --> 00:52:14,760 THEY MAY BE SIMILAR TO THOSE 1190 00:52:14,760 --> 00:52:16,440 TASKS THAT RESEARCHERS WOULD 1191 00:52:16,440 --> 00:52:19,240 WANT TO PERFORM ONCE THEY HAD 1192 00:52:19,240 --> 00:52:20,160 IDENTIFIED A SUITABLE 1193 00:52:20,160 --> 00:52:21,880 COLLECTION. 1194 00:52:21,880 --> 00:52:25,800 SO WEB ARCHIVES STORES AN 1195 00:52:25,800 --> 00:52:27,920 IMMENSE AMOUNT OF DATA BUT THE 1196 00:52:27,920 --> 00:52:29,320 BARRIERS ARE SUCH THAT 1197 00:52:29,320 --> 00:52:31,160 RESEARCHERS AND SCHOLARS MAY NOT 1198 00:52:31,160 --> 00:52:32,520 KNOW WHAT IS AVAILABLE. 1199 00:52:32,520 --> 00:52:34,160 COLLECTIONS CAN HELP BUT FOR 1200 00:52:34,160 --> 00:52:36,200 LARGE COLLECTIONS EVEN THE 1201 00:52:36,200 --> 00:52:37,960 CURATORS MAY NOT KNOW EVERYTHING 1202 00:52:37,960 --> 00:52:39,640 THAT IS CONTAINED IN THEIR 1203 00:52:39,640 --> 00:52:39,960 COLLECTIONS. 1204 00:52:39,960 --> 00:52:42,640 7 SO WE NEED TOOLS THAT CAN HELP 1205 00:52:42,640 --> 00:52:44,040 PEOPLE NOT ONLY FIND WHAT THEY 1206 00:52:44,040 --> 00:52:47,000 ARE LOOKING FOR BUT TO BROWSE 1207 00:52:47,000 --> 00:52:48,680 WHAT IS AVAILABLE. 1208 00:52:48,680 --> 00:52:50,040 SOMETIMES YOU DON'T KNOW WHAT 1209 00:52:50,040 --> 00:52:52,520 YOU'RE LOOKING FOR OR EVEN HOW 1210 00:52:52,520 --> 00:52:54,480 TO ASK FOR WHAT YOU WANT UNTIL 1211 00:52:54,480 --> 00:52:56,200 YOU CAN SEE WHAT QUESTIONS MIGHT 1212 00:52:56,200 --> 00:52:57,080 BE POSSIBLE. 1213 00:52:57,080 --> 00:52:59,240 NOW, WE'VE COME A LONG WAY IN 1214 00:52:59,240 --> 00:53:00,880 HELPING RESEARCHERS ACCESS AND 1215 00:53:00,880 --> 00:53:02,640 USE WEB ARCHIVES BUT WE STILL 1216 00:53:02,640 --> 00:53:04,800 HAVE A LOT OF WORK AHEAD OF US 1217 00:53:04,800 --> 00:53:07,560 IN MAKING THESE TOOLS EASIER TO 1218 00:53:07,560 --> 00:53:09,640 USE AND HELPING RESEARCHERS WITH 1219 00:53:09,640 --> 00:53:11,160 DISCOVERY. 1220 00:53:11,160 --> 00:53:15,200 SO TO WRAP UP AND RECAP MY MAIN 1221 00:53:15,200 --> 00:53:15,640 POINTS. 1222 00:53:15,640 --> 00:53:16,960 WEB ARCHIVES ARE IMPORTANT TO 1223 00:53:16,960 --> 00:53:19,200 UNDERSTANDING THE RECENT PAST 1224 00:53:19,200 --> 00:53:21,640 AND COLLECTIONS ARE AN IMPORTANT 1225 00:53:21,640 --> 00:53:23,960 PART OF ALLOWING RESEARCHERS TO 1226 00:53:23,960 --> 00:53:26,200 DISCOVER ARCHIVED CONTENT. 1227 00:53:26,200 --> 00:53:28,520 OUR GROUP HAS BEEN STUDYING THE 1228 00:53:28,520 --> 00:53:29,960 PROBLEMS ASSOCIATED WITH ACCESS 1229 00:53:29,960 --> 00:53:33,400 TO WEB ARCHIVES AND CHALLENGES 1230 00:53:33,400 --> 00:53:36,800 TO ARCHIVING WEB PAGES FOR OVER 1231 00:53:36,800 --> 00:53:38,600 A DECADE AND WE HOPE SOME OF OUR 1232 00:53:38,600 --> 00:53:40,400 TOOLS CAN HELP PROVIDE 1233 00:53:40,400 --> 00:53:41,760 UNDERSTANDING OF LARGE WEB 1234 00:53:41,760 --> 00:53:44,960 ARCHIVE COLLECTIONS. 1235 00:53:44,960 --> 00:53:46,720 CHALLENGES AND DISCOVERY AND 1236 00:53:46,720 --> 00:53:48,360 RESEARCHER ACCESS REMAINS. 1237 00:53:48,360 --> 00:53:50,000 AND WE'RE CONTINUING TO THINK OF 1238 00:53:50,000 --> 00:53:52,440 WAYS THAT WE CAN SUPPORT 1239 00:53:52,440 --> 00:53:55,560 RESEARCHERS DESIRED TASKS AND TO 1240 00:53:55,560 --> 00:53:57,560 HELP PROMOTE THE USE. 1241 00:53:57,560 --> 00:53:59,640 AND FOR MORE INFORMATION ON OUR 1242 00:53:59,640 --> 00:54:02,120 GROUP'S WORK I'VE PUT OUR 1243 00:54:02,120 --> 00:54:04,320 GROUP'S WEB PAGE AND OUR 1244 00:54:04,320 --> 00:54:06,480 Twitter HANDLE IN THE FOOTER OF 1245 00:54:06,480 --> 00:54:08,680 THESE SLIDES AND I'VE INCLUDED 1246 00:54:08,680 --> 00:54:12,320 THE LINK TO OUR GitHub 1247 00:54:12,320 --> 00:54:14,960 ORGANIZATION. 1248 00:54:14,960 --> 00:54:16,720 SO THANK YOU FOR YOUR ATTENTION 1249 00:54:16,720 --> 00:54:23,840 TODAY. 1250 00:54:23,840 --> 00:54:25,800 >>DR. WEIGLE THANK YOU FOR THAT 1251 00:54:25,800 --> 00:54:26,280 GREAT TALK. 1252 00:54:26,280 --> 00:54:29,120 I'VE LEARNED A LOT AND I'M SURE 1253 00:54:29,120 --> 00:54:30,800 MANY WHO ARE WATCHING DID AS 1254 00:54:30,800 --> 00:54:38,000 WELL. 1255 00:54:38,000 --> 00:54:39,880 AND APPRECIATE WHAT YOU 1256 00:54:39,880 --> 00:54:41,280 PRESENTED AND THEIR 1257 00:54:41,280 --> 00:54:43,160 CONTRIBUTIONS TO ALL THAT YOU 1258 00:54:43,160 --> 00:54:43,960 PRESENTED AND MUCH MORE OF 1259 00:54:43,960 --> 00:54:45,280 COURSE. 1260 00:54:45,280 --> 00:54:48,480 WE HAVE TIME FOR Q&A. 1261 00:54:48,480 --> 00:54:50,440 SO I WOULD LIKE TO REMIND THOSE 1262 00:54:50,440 --> 00:54:52,440 WHO ARE WATCHING PLEASE USE THE 1263 00:54:52,440 --> 00:54:54,440 LIVE FEEDBACK BUTTON UNDERNEATH 1264 00:54:54,440 --> 00:54:57,600 THE VIDEO STREAM TO SEND 1265 00:54:57,600 --> 00:54:59,120 QUESTIONS TO DR. WEIGLE. 1266 00:54:59,120 --> 00:55:01,360 WE HAVE A COUPLE OF QUESTIONS TO 1267 00:55:01,360 --> 00:55:05,040 BEGIN. 1268 00:55:05,040 --> 00:55:07,080 THANK YOU FOR THIS INFORMATIVE 1269 00:55:07,080 --> 00:55:09,160 AND FASCINATING TALK. 1270 00:55:09,160 --> 00:55:11,960 YOU NOTED EARLY THAT IT WOULD BE 1271 00:55:11,960 --> 00:55:14,400 DIFFICULT TO STUDY THE HISTORY 1272 00:55:14,400 --> 00:55:16,880 OF THE 1990s WITHOUT AN 1273 00:55:16,880 --> 00:55:19,240 INTERNET ARCHIVE BUT WEB 1274 00:55:19,240 --> 00:55:20,680 ARCHIVING WAS MUCH MORE LIMITED 1275 00:55:20,680 --> 00:55:24,320 IN THE EARLY DAYS. 1276 00:55:24,320 --> 00:55:26,960 IF SOMEONE HAS THE EARLY CODING 1277 00:55:26,960 --> 00:55:29,000 IS IT STILL POSSIBLE TO ADD THAT 1278 00:55:29,000 --> 00:55:33,000 TO AN ARCHIVE LIKE ARCHIVE IT? 1279 00:55:33,000 --> 00:55:36,120 >>THAT IS A REALLY GOOD 1280 00:55:36,120 --> 00:55:36,400 QUESTION. 1281 00:55:36,400 --> 00:55:43,280 SO WE HAD ONE OF OUR FORMER 1282 00:55:43,280 --> 00:55:45,520 GRADUATES HAD, WAS WORKING AT 1283 00:55:45,520 --> 00:55:49,160 STANFORD AND HELPED THEM TO 1284 00:55:49,160 --> 00:55:51,120 DISCOVER THE VERY FIRST WEB PAGE 1285 00:55:51,120 --> 00:55:56,040 THAT WAS BUILT AT I THINK IT WAS 1286 00:55:56,040 --> 00:55:59,360 SLAC A STANFORD LABORATORY AND 1287 00:55:59,360 --> 00:56:00,360 THEY HAD JUST THAT. 1288 00:56:00,360 --> 00:56:03,040 THEY HAD THE COMPONENTS OF THE 1289 00:56:03,040 --> 00:56:04,480 WEB PAGE. 1290 00:56:04,480 --> 00:56:06,800 AND THEY WERE ABLE TO 1291 00:56:06,800 --> 00:56:11,120 ESSENTIALLY FORMULATE IT INTO 1292 00:56:11,120 --> 00:56:13,440 WARK FILES SO THEY COULD THEN 1293 00:56:13,440 --> 00:56:19,080 REPLAY THAT BACK. 1294 00:56:19,080 --> 00:56:20,360 IT CAN BE DONE. 1295 00:56:20,360 --> 00:56:22,800 BUT I DON'T KNOW NECESSARILY 1296 00:56:22,800 --> 00:56:26,240 THAT THERE IS A TOOL BUILT RIGHT 1297 00:56:26,240 --> 00:56:28,760 NOW THAT WOULD ALLOW TO YOU 1298 00:56:28,760 --> 00:56:30,920 SUBMIT ALL OF THE FILES THAT 1299 00:56:30,920 --> 00:56:32,760 WOULD BUILD A WEB PAGE AND MAKE 1300 00:56:32,760 --> 00:56:35,400 IT AVAILABLE. 1301 00:56:35,400 --> 00:56:41,040 BUT -- THE FOLKS AT INTERNET 1302 00:56:41,040 --> 00:56:43,000 ARCHIVE MAY BE ABLE TO HELP WITH 1303 00:56:43,000 --> 00:56:44,040 THAT. 1304 00:56:44,040 --> 00:56:44,760 >>THANK YOU. 1305 00:56:44,760 --> 00:56:46,360 ANOTHER QUESTION, KNOWING AND 1306 00:56:46,360 --> 00:56:48,240 DOING WHAT YOU AND YOUR RESEARCH 1307 00:56:48,240 --> 00:56:50,320 TEAM YOUR COLLEAGUES ARE DOING 1308 00:56:50,320 --> 00:56:52,520 NOW WHAT IS ON THE HORIZON FOR 1309 00:56:52,520 --> 00:56:54,000 WEB ARCHIVING? 1310 00:56:54,000 --> 00:56:56,520 WHERE ARE INNOVATIONS HAPPENING 1311 00:56:56,520 --> 00:56:59,200 TO KEEP UP WITH COLLECTING THIS 1312 00:56:59,200 --> 00:56:59,960 DIGITAL CONTENT? 1313 00:56:59,960 --> 00:57:01,880 >>THAT IS A GREAT QUESTION. 1314 00:57:01,880 --> 00:57:04,120 AVENUE ONE OF THE THINGS THAT WE 1315 00:57:04,120 --> 00:57:07,800 HAVE TO DO IS WE HAVE TO BE 1316 00:57:07,800 --> 00:57:10,640 AWARE OF THE CHANGING TECHNOLOGY 1317 00:57:10,640 --> 00:57:12,960 ON THE WEB ITSELF. 1318 00:57:12,960 --> 00:57:16,920 BECAUSE YES, WE'RE ALWAYS 1319 00:57:16,920 --> 00:57:18,640 CHASING WHAT ARE THE NEW THINGS 1320 00:57:18,640 --> 00:57:21,720 THAT ARE BEING DEVELOPED. 1321 00:57:21,720 --> 00:57:23,600 BECAUSE AGAIN THE PLATFORMS AND 1322 00:57:23,600 --> 00:57:25,600 THE TECHNOLOGIES USED TO BUILD 1323 00:57:25,600 --> 00:57:28,160 WEB PAGES ARE NOT BUILT IN MIND 1324 00:57:28,160 --> 00:57:29,920 WITH WEB ARCHIVING IN MIND. 1325 00:57:29,920 --> 00:57:32,600 THEY ARE NOT MAKING IT EASY FOR 1326 00:57:32,600 --> 00:57:35,160 THESE THINGS TO BE ARCHIVED 1327 00:57:35,160 --> 00:57:36,360 NECESSARILY. 1328 00:57:36,360 --> 00:57:38,160 AND IT'S NOT THAT THEY DON'T 1329 00:57:38,160 --> 00:57:39,760 WANT THINGS TO BE ARCHIVED BUT 1330 00:57:39,760 --> 00:57:42,280 THAT IS NOT THEIR MAIN GOAL. 1331 00:57:42,280 --> 00:57:43,640 THEIR MAIN GOAL IS SPEED AND 1332 00:57:43,640 --> 00:57:44,560 DELIVERY. 1333 00:57:44,560 --> 00:57:52,960 AND SO WE HAVE TO CONTINUE TO BE 1334 00:57:52,960 --> 00:57:55,840 UP ON HOW WEB PAGES ARE BEING 1335 00:57:55,840 --> 00:57:59,040 BUILT TO BE ABLE TO REACT TO 1336 00:57:59,040 --> 00:58:02,200 CHANGES IN THE WEB. 1337 00:58:02,200 --> 00:58:03,920 OTHER THINGS THAT -- SOME OF THE 1338 00:58:03,920 --> 00:58:06,120 THINGS THAT I MENTIONED HERE -- 1339 00:58:06,120 --> 00:58:07,400 THERE IS THE ISSUE OF THE 1340 00:58:07,400 --> 00:58:08,640 CAPTURE. 1341 00:58:08,640 --> 00:58:10,920 WHICH IS WE HAVE TO KEEP UP WITH 1342 00:58:10,920 --> 00:58:11,840 THE TECHNOLOGY. 1343 00:58:11,840 --> 00:58:14,640 ALSO THE ISSUE OF WE HAVE SO 1344 00:58:14,640 --> 00:58:15,760 MUCH STUFF ALREADY. 1345 00:58:15,760 --> 00:58:18,120 HOW CAN WE HELP PEOPLE DISCOVER 1346 00:58:18,120 --> 00:58:20,880 AND USE IT. 1347 00:58:20,880 --> 00:58:23,760 SO I MEAN BOTH OF THOSE HAVE 1348 00:58:23,760 --> 00:58:26,680 LOTS OF CHALLENGES. 1349 00:58:26,680 --> 00:58:28,280 >>THANK YOU. 1350 00:58:28,280 --> 00:58:30,960 SO WE HAVE ANOTHER QUESTION 1351 00:58:30,960 --> 00:58:33,520 ABOUT REALLY ABOUT PRESERVATION. 1352 00:58:33,520 --> 00:58:36,440 THIS PERSON WRITES IN, "SINCE 1353 00:58:36,440 --> 00:58:38,880 THE INTERNET ARCHIVE IS A 1354 00:58:38,880 --> 00:58:41,040 PRIVATE BASED SYSTEM IT EXISTS 1355 00:58:41,040 --> 00:58:43,600 ON WHAT PEOPLE CALL SOFT MOONEY 1356 00:58:43,600 --> 00:58:46,080 AND COULD BE SEEN AS VULNERABLE 1357 00:58:46,080 --> 00:58:48,560 TO FUNDING SHORTAGES THAT COULD 1358 00:58:48,560 --> 00:58:51,520 MAKE IT DISAPPEAR. 1359 00:58:51,520 --> 00:58:54,000 WHAT ARE THE CONCERNS ABOUT THIS 1360 00:58:54,000 --> 00:58:56,560 POSSIBILITY AND HOW ARE THEY 1361 00:58:56,560 --> 00:59:06,720 MITIGATED? 1362 00:59:07,680 --> 00:59:10,640 >>HONESTLY I DON'T HAVE A GOOD 1363 00:59:10,640 --> 00:59:12,160 ANSWER FOR THAT. 1364 00:59:12,160 --> 00:59:16,160 THOSE ARE REAL CONCERNS. 1365 00:59:16,160 --> 00:59:21,320 AND I KNOW THAT TYPICALLY 1366 00:59:21,320 --> 00:59:24,280 LIBRARIES ARE UNDER FUNDED. 1367 00:59:24,280 --> 00:59:26,960 AND THESE ARCHIVING EFFORTS THAT 1368 00:59:26,960 --> 00:59:30,160 ARE TAKING PLACE IT TAKES A LOT 1369 00:59:30,160 --> 00:59:33,240 OF MONEY TO RUN THESE SERVERS 1370 00:59:33,240 --> 00:59:36,280 FOR THE STORAGE AND FOR ALSO THE 1371 00:59:36,280 --> 00:59:38,520 ACCESS FOR THE PUBLIC TO BE ABLE 1372 00:59:38,520 --> 00:59:41,600 TO REPLAY THESE WEB PAGES. 1373 00:59:41,600 --> 00:59:44,800 SO I MEAN IT WOULD BE GREAT IF 1374 00:59:44,800 --> 00:59:50,000 THERE WERE A LARGE INFUSION OF 1375 00:59:50,000 --> 00:59:52,120 MONEY INTO THESE EFFORTS. 1376 00:59:52,120 --> 00:59:56,720 BUT I DON'T KNOW THE ANSWER TO 1377 00:59:56,720 --> 00:59:57,440 THAT. 1378 00:59:57,440 --> 01:00:00,800 BUT I THINK IT'S A REAL PROBLEM. 1379 01:00:00,800 --> 01:00:05,160 >>THANK YOU. 1380 01:00:05,160 --> 01:00:06,280 SOMEONE WRITES IN THANKS 1381 01:00:06,280 --> 01:00:08,200 DR. WEIGLE FOR THIS GREAT TALK. 1382 01:00:08,200 --> 01:00:09,960 THIS WAS AN EXCELLENT OVERVIEW. 1383 01:00:09,960 --> 01:00:12,160 I WOULD LIKE TO REVIEW THE 1384 01:00:12,160 --> 01:00:13,360 SLIDES AGAIN ESPECIALLY THE 1385 01:00:13,360 --> 01:00:15,320 VALUABLE LINKS THAT YOU PROVIDED 1386 01:00:15,320 --> 01:00:18,560 AND WILL NOTE THE URL IS AT THE 1387 01:00:18,560 --> 01:00:20,880 BOTTOM OF YOUR CONCLUDING SLIDE 1388 01:00:20,880 --> 01:00:23,000 AND OF COURSE IN THE TWEET THAT 1389 01:00:23,000 --> 01:00:25,120 YOU KINDLY SENT. 1390 01:00:25,120 --> 01:00:27,080 SO THANK YOU VERY MUCH. 1391 01:00:27,080 --> 01:00:29,840 >>YOU'RE WELCOMED 1392 01:00:29,840 --> 01:00:34,880 >>CAN YOU COMMENT -- TO WHAT 1393 01:00:34,880 --> 01:00:37,360 EXTENT ARE THESE GROUP WORKING 1394 01:00:37,360 --> 01:00:39,360 TOGETHER TO NOT ONLY PRESERVE 1395 01:00:39,360 --> 01:00:42,040 HISTORY BUT TO WRITE ABOUT IT? 1396 01:00:42,040 --> 01:00:43,560 >>THAT IS A GOOD QUESTION. 1397 01:00:43,560 --> 01:00:48,720 SO I WOULD HAVE TO DEFER TO MY 1398 01:00:48,720 --> 01:00:52,280 COLLEAGUES WHO ARE HISTORIANS 1399 01:00:52,280 --> 01:00:55,160 LIKE DR. IAN MILLIGAN. 1400 01:00:55,160 --> 01:00:57,160 WE MET HIM SEVERAL YEARS AGO AND 1401 01:00:57,160 --> 01:01:02,920 HE WAS ON THIS FOREFRONT AS A 1402 01:01:02,920 --> 01:01:05,040 HISTORIAN STUDYING AND USING WEB 1403 01:01:05,040 --> 01:01:07,480 ARCHIVES IN HIS RESEARCH IN 1404 01:01:07,480 --> 01:01:11,160 WRITING ABOUT HISTORY. 1405 01:01:11,160 --> 01:01:15,640 AND SO I WOULD DEFER TO SOMEONE 1406 01:01:15,640 --> 01:01:18,760 LIKE HIM FOR THAT QUESTION. 1407 01:01:18,760 --> 01:01:22,920 AND HE IS ACTUALLY WRITTEN AND 1408 01:01:22,920 --> 01:01:25,480 BEEN AN EDITOR ON A COUPLE OF 1409 01:01:25,480 --> 01:01:28,720 VERY NICE BOOKS. 1410 01:01:28,720 --> 01:01:30,800 I BELIEVE THE WEB IS HISTORY IS 1411 01:01:30,800 --> 01:01:33,360 ONE OF THEM. 1412 01:01:33,360 --> 01:01:35,800 BUT THERE IS MORE AND MORE BEING 1413 01:01:35,800 --> 01:01:38,600 WRITTEN ABOUT IN USING WEB 1414 01:01:38,600 --> 01:01:40,000 ARCHIVES FOR HISTORICAL 1415 01:01:40,000 --> 01:01:40,280 RESEARCH. 1416 01:01:40,280 --> 01:01:41,880 78 1417 01:01:41,880 --> 01:01:43,880 >>THANK YOU. 1418 01:01:43,880 --> 01:01:44,400 *. 1419 01:01:44,400 --> 01:01:46,160 ANOTHER QUESTION IS WHEN DID IT 1420 01:01:46,160 --> 01:01:48,840 OCCUR TO PEOPLE TO BEGIN TO 1421 01:01:48,840 --> 01:01:52,280 ARCHIVE THE WORLDWIDE WEB? 1422 01:01:52,280 --> 01:01:54,400 >>SO THE INTERNET ARCHIVE 1423 01:01:54,400 --> 01:01:56,440 ITSELF WAS STARTED IN 1996. 1424 01:01:56,440 --> 01:02:00,360 SO IT WAS PRETTY EARLY IN KIND 1425 01:02:00,360 --> 01:02:04,520 OF THE HISTORY OF THE WEB. 1426 01:02:04,520 --> 01:02:08,560 8 AND BREWSTER KEEL STARTED THE 1427 01:02:08,560 --> 01:02:11,080 INTERNET ARCHIVE AND * HE WANTED 1428 01:02:11,080 --> 01:02:15,000 TO BUILD A NEW ALEXANDRIA 1429 01:02:15,000 --> 01:02:15,240 LIBRARY. 1430 01:02:15,240 --> 01:02:17,000 CAN YOU CONTAIN ALL OF THE 1431 01:02:17,000 --> 01:02:19,520 WORLD'S KNOWLEDGE? 1432 01:02:19,520 --> 01:02:21,880 SO PART OF THAT AND THE INTERNET 1433 01:02:21,880 --> 01:02:23,960 ARCHIVE CONTAINS MANY THINGS 1434 01:02:23,960 --> 01:02:26,400 OTHER THAN JUST WEB ARCHIVES. 1435 01:02:26,400 --> 01:02:28,840 THEY HAVE MOVIES AND BOOKS AND 1436 01:02:28,840 --> 01:02:32,120 RECORDINGS. 1437 01:02:32,120 --> 01:02:33,920 BUT THE WEB WAS A LARGE PART OF 1438 01:02:33,920 --> 01:02:34,360 THAT. 1439 01:02:34,360 --> 01:02:37,720 SO HE HAD THAT VISION EARLY ON 1440 01:02:37,720 --> 01:02:39,520 THAT THIS WAS GOING TO BE AN 1441 01:02:39,520 --> 01:02:40,400 IMPORTANT RESOURCE FOR US. 1442 01:02:40,400 --> 01:02:42,960 AND WE'RE VERY GLAD THAT HE DID 1443 01:02:42,960 --> 01:02:44,160 THAT. 1444 01:02:44,160 --> 01:02:44,880 >>CERTAINLY. 1445 01:02:44,880 --> 01:02:46,600 THANK YOU. 1446 01:02:46,600 --> 01:02:48,840 CAN YOU SAY MORE ABOUT HOW THE 1447 01:02:48,840 --> 01:02:51,240 COVID PANDEMIC CHANGED THINKING 1448 01:02:51,240 --> 01:02:53,040 IN PRACTICE AROUND WEB 1449 01:02:53,040 --> 01:02:53,920 ARCHIVING. 1450 01:02:53,920 --> 01:02:57,200 ARE THERE ASPECTS THAT YOU THINK 1451 01:02:57,200 --> 01:02:59,200 WOULD NOT BE DONE NOW IF IT 1452 01:02:59,200 --> 01:03:01,320 WASN'T FOR THE PANDEMIC 1453 01:03:01,320 --> 01:03:08,840 EXPERIENCE OF THE PANDEMIC? 1454 01:03:08,840 --> 01:03:09,960 >>I HAVEN'T THOUGHT ABOUT IT 1455 01:03:09,960 --> 01:03:12,960 THAT WAY BUT I THINK THERE ARE 1456 01:03:12,960 --> 01:03:18,720 DEFINITELY SOME OF THINGS THAT 1457 01:03:18,720 --> 01:03:20,920 EXPERIENCES THAT MAYBE WERE 1458 01:03:20,920 --> 01:03:22,160 MISSING BECAUSE SOME OF THE, 1459 01:03:22,160 --> 01:03:24,480 MANY OF THE THINGS THAT WE DO 1460 01:03:24,480 --> 01:03:26,800 NOW JUST LIKE WHAT WE'RE 1461 01:03:26,800 --> 01:03:28,280 CURRENTLY DOING ARE THINGS THAT 1462 01:03:28,280 --> 01:03:30,480 WE DIDN'T DO VERY OFTEN BEFORE 1463 01:03:30,480 --> 01:03:34,920 COVID IN TERMS OF THIS VIDEO 1464 01:03:34,920 --> 01:03:37,480 CAST AND THE ZOOM MEETINGS THAT 1465 01:03:37,480 --> 01:03:39,320 WE'VE HAD. 1466 01:03:39,320 --> 01:03:42,200 AND IT FEELS LIKE THIS IS NOT 1467 01:03:42,200 --> 01:03:43,760 NECESSARILY WEB ARCHIVING BUT 1468 01:03:43,760 --> 01:03:48,520 OUR COLLECTIVE EXPERIENCE 1469 01:03:48,520 --> 01:03:51,600 ARCHIVING AND HOW ARE PEOPLE 1470 01:03:51,600 --> 01:03:55,440 CAPTURING THINGS LIKE ZOOM 1471 01:03:55,440 --> 01:03:57,560 MEETINGS THAT HISTORIANS MAY 1472 01:03:57,560 --> 01:03:59,240 WANT TO GO BACK AND TALK ABOUT 1473 01:03:59,240 --> 01:03:59,960 THIS. 1474 01:03:59,960 --> 01:04:02,080 BECAUSE WE DEFINITELY ADDED SOME 1475 01:04:02,080 --> 01:04:04,640 NEW TERMINOLOGY TO OUR LEXICON 1476 01:04:04,640 --> 01:04:05,720 THAT WE'RE ALL FAMILIAR WITH 1477 01:04:05,720 --> 01:04:08,000 NOW. 1478 01:04:08,000 --> 01:04:09,120 >>THANK YOU. 1479 01:04:09,120 --> 01:04:11,640 SO WE HAVE TIME FOR ONE MORE 1480 01:04:11,640 --> 01:04:13,080 QUESTION, DR. WEIGLE. 1481 01:04:13,080 --> 01:04:15,760 IF I WANT TO DO WEB ARCHIVING 1482 01:04:15,760 --> 01:04:19,160 AND IN SIMILAR WAYS TO WHAT YOU 1483 01:04:19,160 --> 01:04:20,320 PRESENTED TODAY WHAT KIND OF 1484 01:04:20,320 --> 01:04:21,920 TRAINING DO I NEED? 1485 01:04:21,920 --> 01:04:24,280 WHERE DO I NEED TO GO TO SCHOOL? 1486 01:04:24,280 --> 01:04:26,320 WHAT KIND OF BACKGROUND DO I 1487 01:04:26,320 --> 01:04:28,760 NEED? 1488 01:04:28,760 --> 01:04:30,840 >>COME TO ODU. 1489 01:04:30,840 --> 01:04:34,080 I'LL MAKE A PITCH FOR US. 1490 01:04:34,080 --> 01:04:41,320 WE -- THERE ARE IN BOTH LIBRARY 1491 01:04:41,320 --> 01:04:44,400 SCHOOLS AND COMPUTER SCIENCE 1492 01:04:44,400 --> 01:04:46,000 SCHOOLS, WE COMBINE A LITTLE BIT 1493 01:04:46,000 --> 01:04:49,040 OF LIBRARY SCIENCE AND COMPUTER 1494 01:04:49,040 --> 01:04:51,040 SCIENCE IN WHAT WE DO. 1495 01:04:51,040 --> 01:04:53,160 WE HAVE TO BE AWARE OF TOPICS IN 1496 01:04:53,160 --> 01:04:55,560 BOTH AREAS. 1497 01:04:55,560 --> 01:04:57,040 AND HONESTLY AS A PITCHER WITH'S 1498 01:04:57,040 --> 01:04:59,880 DEVELOPING A WEB ARCHIVING 1499 01:04:59,880 --> 01:05:02,120 CERTIFICATE JOINT WITH OUR 1500 01:05:02,120 --> 01:05:04,920 LIBRARY PROGRAM HERE AT ODU. 1501 01:05:04,920 --> 01:05:08,320 SO WE'RE ACTIVELY THINKING ABOUT 1502 01:05:08,320 --> 01:05:11,360 IN TERMS OF WHAT DO WE NEED TO 1503 01:05:11,360 --> 01:05:14,120 GIVE TO STUDENTS TO GIVE THEM 1504 01:05:14,120 --> 01:05:16,360 THE BACKGROUND IN BOTH OF THOSE 1505 01:05:16,360 --> 01:05:18,040 ASPECTS SO THAT THEY ARE AWARE 1506 01:05:18,040 --> 01:05:22,480 OF ISSUES IN TERMS OF METADATA 1507 01:05:22,480 --> 01:05:26,360 IN COLLECTING AND ACCESS AND 1508 01:05:26,360 --> 01:05:29,680 THEN ALSO HAVE THE COMPUTER 1509 01:05:29,680 --> 01:05:31,920 SCIENCE SIDE TO BE ABLE TO 1510 01:05:31,920 --> 01:05:34,360 UNDERSTAND HOW THE PAGES ARE 1511 01:05:34,360 --> 01:05:36,960 ACTUALLY ARCHIVED AND REPLAYED. 1512 01:05:36,960 --> 01:05:38,160 >>THANK YOU. 1513 01:05:38,160 --> 01:05:39,920 AND CERTAINLY ALL OF THE BEST 1514 01:05:39,920 --> 01:05:41,840 WITH THE CERTIFICATE PROGRAM. 1515 01:05:41,840 --> 01:05:43,760 AND WE CERTAINLY WISH THIS 1516 01:05:43,760 --> 01:05:45,000 PERSON WHO WROTE THIS QUESTION 1517 01:05:45,000 --> 01:05:47,200 ALL OF THE BEST IN SEEKING A 1518 01:05:47,200 --> 01:05:52,720 CAREER IN WEB ARCHIVING AND 1519 01:05:52,720 --> 01:05:54,760 ARCHIVAL SCIENCE AND COMPUTER 1520 01:05:54,760 --> 01:05:56,160 SCIENCE AS WELL. 1521 01:05:56,160 --> 01:05:58,040 WITH THAT -- I WANT TO THANK YOU 1522 01:05:58,040 --> 01:05:59,800 VERY MUCH FOR OUR ENGAGEMENT FOR 1523 01:05:59,800 --> 01:06:01,360 THE TIME THAT YOU TOOK TO 1524 01:06:01,360 --> 01:06:03,040 PREPARE YOUR PRESENTATION. 1525 01:06:03,040 --> 01:06:04,720 AND WE LOOK FORWARD TO KEEPING 1526 01:06:04,720 --> 01:06:06,400 IN TOUCH AND WE REALLY 1527 01:06:06,400 --> 01:06:08,000 APPRECIATE YOU TALKING ABOUT OUR 1528 01:06:08,000 --> 01:06:10,240 WEB ARCHIVING COLLECTION 1529 01:06:10,240 --> 01:06:11,760 ALONGSIDE SO MANY OTHERS THAT 1530 01:06:11,760 --> 01:06:14,080 HAVE BEEN DEVELOPED AND SO WELL 1531 01:06:14,080 --> 01:06:15,880 CURATED BY YOU AND YOUR 1532 01:06:15,880 --> 01:06:17,680 COLLEAGUES AND OF COURSE OUR 1533 01:06:17,680 --> 01:06:18,880 COLLEAGUES HERE AT THE NATIONAL 1534 01:06:18,880 --> 01:06:19,880 LIBRARY OF MEDICINE. 1535 01:06:19,880 --> 01:06:25,520 I WANT TO THANK CHRISTIE MOFF 1536 01:06:25,520 --> 01:06:25,760 MOFFETT. 1537 01:06:25,760 --> 01:06:26,720 THANK YOU VERY MUCH. 1538 01:06:26,720 --> 01:06:29,080 WE WISH YOU AND YOUR COLLEAGUES 1539 01:06:29,080 --> 01:06:29,760 ALL THE BEST. 1540 01:06:29,760 --> 00:00:00,000 THANK YOU.