NDIIPP Web-at-Risk Curators' Meeting
Oakland, CA
05.30.2007
- LOC: article about our collection development plans on NDIIPP website
- talk about recent NDIIPP "funding kerfuffle"
- Dec 2007: WAS usable release date
- Jan/June 2008: end-user phase
Next Step
- be promoters of the service
- communicate to our institutions the value of this service
(Tracy Seneca gave project update) |
WAS Demo of Release 4 (Tracy Seneca)
- tips @ right of screen (some open to detailed guides)
- host site + linked pages
- grabs external *pages*, not entire site
- Question: does it grad the images/elements/CSS that make up that page, as well?
- descriptive data *will* include auto-captured data (examples in later release)
- "delete: doesn't delete captured content; it only deletes the capture action for that site
- hence, the option to limit the site list under "Manage Sites" to "inactive sites"
- "view job history" = capture dates and various results
- "View Results" = view successful captures
- will display most recent job at top
- must choose capture date; will make site title non-clickable to avoid confusion
- timestamps currently in PST (working to change this)
- results screen (first time) will always have some lag (alot of data to load), but will load quickly thereafter (cached)
- "start/finish time" slightly different than "duration:
- s/f includes time spent in queue, etc.
- duration only counts active crawler time
- Robots Exclusion
- Heritrix will capture the *rules* even if it can't capture the site (there is help to decipher these files)
- common exclusion = CSS files
- "response code" reports
- may be more meaningful for long-term captures (health of a site)
- "Hosts" report
- lists subdomains
- lists all hosts
- will also be developing FAQ
- will provide search tips
- abstract = auto-generated (portion of site included; search terms?)
- full-text search capability
- can add individual documents to a collection from the search tab
- display top bar will show navigation to various (dated) versions
- can add *comment* to user-defined metadata (for curator view)
- will add sorting to file list under "results"
- included/excluded under "collection"
- good way to exclude a few non-public domain files from a site
- time delay between collection building and viewing/searching contents
- currently trying to predict this amount of time
- or may appear to be a mismatch between "contents" and "files"
note: CC: way to manage list of captures
Usability Testing (Kathleen Murray)
Comments (0)
You don't have permission to comment on this page.