| 
  • If you are citizen of an European Union member nation, you may not use this service unless you are at least 16 years old.

  • You already know Dokkio is an AI-powered assistant to organize & manage your digital files & messages. Very soon, Dokkio will support Outlook as well as One Drive. Check it out today!

View
 

WebAtRisk_20070530

Page history last edited by PBworks 16 years, 10 months ago

NDIIPP Web-at-Risk Curators' Meeting

Oakland, CA

05.30.2007

 

 

  • LOC: article about our collection development plans on NDIIPP website
  • talk about recent NDIIPP "funding kerfuffle"
  • Dec 2007: WAS usable release date
  • Jan/June 2008: end-user phase

 

Next Step

  • be promoters of the service
  • communicate to our institutions the value of this service

 

(Tracy Seneca gave project update)

 

WAS Demo of Release 4 (Tracy Seneca)

  • tips @ right of screen (some open to detailed guides)
  • host site + linked pages
    • grabs external *pages*, not entire site
    • Question: does it grad the images/elements/CSS that make up that page, as well?
  • descriptive data *will* include auto-captured data (examples in later release)
  • "delete: doesn't delete captured content; it only deletes the capture action for that site
    • hence, the option to limit the site list under "Manage Sites" to "inactive sites"
  • "view job history" = capture dates and various results
  • "View Results" = view successful captures
    • will display most recent job at top
  • must choose capture date; will make site title non-clickable to avoid confusion
  • timestamps currently in PST (working to change this)
  • results screen (first time) will always have some lag (alot of data to load), but will load quickly thereafter (cached)
  • "start/finish time" slightly different than "duration:
    • s/f includes time spent in queue, etc.
    • duration only counts active crawler time
  • Robots Exclusion
    • Heritrix will capture the *rules* even if it can't capture the site (there is help to decipher these files)
    • common exclusion = CSS files
  • "response code" reports
    • may be more meaningful for long-term captures (health of a site)
  • "Hosts" report
    • lists subdomains
    • lists all hosts
  • will also be developing FAQ
  • will provide search tips
    • abstract = auto-generated (portion of site included; search terms?)
    • full-text search capability
    • can add individual documents to a collection from the search tab
  • display top bar will show navigation to various (dated) versions
  • can add *comment* to user-defined metadata (for curator view)
  • will add sorting to file list under "results"
  • included/excluded under "collection"
    • good way to exclude a few non-public domain files from a site
  • time delay between collection building and viewing/searching contents
    • currently trying to predict this amount of time
    • or may appear to be a mismatch between "contents" and "files"

 

note: CC: way to manage list of captures

 

Usability Testing (Kathleen Murray)

Comments (0)

You don't have permission to comment on this page.