Let’s get practical — Part 3 (capturing text)

As I mentioned in Let’s get practical — Part 2, I follow the practice of capturing information from sources (mostly online) and saving it in discrete, labeled chunks. I use a similar process for Insights: Write an idea down as a statement and associate it with explanatory text.

Sometimes I capture these chunks directly in my primary tool for gathering, organizing, and browsing “knowledge” — TheBrain Pro. TheBrain often does a good job of capturing the location and the name/title of documents, and TheBrain grabs the content of the <title> element of the source page where it is available for the name of a “Thought” (in Brain terminology). But there’s no option to grab a specific zone of text at the same time — and that’s often what you want most.

Annotation and note-taking tools

In this post, I describe — very briefly — two similar products for capturing text and metadata from resources:

  • QuoteUrlText for Mozilla Firefox
  • CintaNotes for Windows

These are not comprehensive product reviews! I strongly encourage you to exercise these tools yourself. They are free unless otherwise noted. Where the vendor offers paid commercial upgrades, many offer a free basic version. I have chose only those products whose “basic-level” functionality is substantial — without opting for upgrades to more feature-rich variants.

Where to look for tools for capturing text and metadata

You may also want consider products like Zotero for capturing metadata. Zotero offers a rich set of features for capturing and managing bibliographic data.

So, what would you like to capture most from electronic resources?

As with all activities involved in the capture and management of “knowledge,” you will want tools that make your work easier … and more precise. But how are  you going to do that without some model for you are capturing and managing?

Much of our “knowledge work” consists of research — finding, capturing, and quoting information from a range of resources.

When you find something of interest, you will want tools that allow you to capture as much information as possible, including, ideally:

  • The location and filename of the document from which you are quoting —  often a Web URL or a file on your computer or local network.
  • The title of the document from which you are taking the information. The HTML <title> element  from the source page is often a good source for the document title, but files in other formats lack this cue. Sometimes the title is the first line of the file.
  • An option to create a short description or paraphrasing of captured text. You often want to provide a meaningful name for a captured snippet or image — which will be different from the title of the document from which you are capturing information.,
  • Date and maybe even time of capture.
  • Author and organization/affiliation information
  • Other bibliographic information
  • The strings of text you want to quote.

    • Most of us understand guidelines that help us to avoid plagiarism.
    • Some tools capture style information; others do not. Both approaches have their problems. There are workarounds in many cases.
  • Images, too. But the guidelines may be less clear. You need to use those images responsibly. See, for example, Follow This Chart to Know If You Can Use an Image from the Internet.
  • Comments on the captured information. This may include restating the captured text more clearly and succinctly or explaining a graphic.
  • “Tagging.” I’m not a big proponent of tagging — inserting keywords in [typically] non-systematic ways — but tags can be useful at times.
  • Name of the person who performed the capture (and may have added comments).

You’re not likely to find all these features in any particular annotation or note-taking tool, but they are still useful.

Some online publications make extensive capture of metadata possible by embedding that information according to known standards. For example, The New York Times online edition makes some use of the Schema.org structured vocabulary.

Mapping data captured in CintaNotes and other tools to a model for representing knowledge

If you start your knowledge-capture process by capturing information from sources (mostly online) and saving them in discrete, labeled chunks, you’ll want a target model for that information. My model for representing practical knowledge is ICKMOD, which I describe at some length in this blog.

Even if you do not have an explicit target model, I suspect you have an implicit model for information that will be incorporated into larger resources, either for your own purposes or for sharing with your organization.So don’t let the formalities get in the way of capturing knowledge.

But a small dose of consistency up front can save you a lot of time and effort later … when you ultimately find tools that work well for you and others when you are constructing knowledgebases.

See also, Incremental Formalization.

CintaNotes for Windows

The free version of CintaNotes allows you to “Take notes from anywhere and automatically organize them.” Pro and Llifetime Pro versions provide additional functionality … for a fee. But the base free product is very useful for capturing text and associated metadata.

Important plusses of CintaNotes

  • The content of the HTML <title> element is captured automatically as a CintaNotes Title.
  • In some cases, at least, document titles are captured from PDF files.
  • Other important data is captured automatically
    • The URL of online material. One click gets you to the source document.
    • The “name” of that document
  • Text is captured from most common types of content — including from any Web browser
    • But styles are lost (That’s not always bad.)
  • You can export some or all the notes you capture in a text format.
  • CintaNotes is very easy to use. Highlight (select) a zone of text, press Ctrl+F12, modify a couple fields (tags, remarks) if you wish, click OK, and much of your work is done.

Important limitations of CintaNotes

  • The exported notes do not contain specific markup for all fields, but it is possible to apply a script that would add markup and rearrange content based on field labels, syntax of strings, distinctive punctuation, and line-ends. (I use Perl scripts for such purposes.)
  • Author information is not captured automatically.

QuoteURLText Add-on for FireFox

QuoteURLText is a free add-on for the Mozilla FireFox browser. Like CintaNotes, it captures selected text and metadata from files open in FireFox so that you can paste them into other applications.

Important pluses of QuoteURLText

  • The content of the <title> element is captured automatically from HTML files.
  • In some cases, at least, document titles are captured from PDF files.
  • Other important data is captured automatically
    • The URL of online material.
    • The “name” of that document
  • Text is captured from common types of content that can be opened with the FireFox Web browser.
    • Text formatting can be retained when pasted into another application that supports the formatting of the source. (That’s not always good, but it can be very handy if you use HTML as your standard tool for writing. Perhaps good for Microsoft Word and Word-compatible word processers? Not tested.)
  • The order of metadata elements captured is definable.
  • QuoteURLText is very easy to use. Select text in the browser, right-click and select Quote text, paste it into another application.

Important limitations of QuoteURLText

  • This is a one-time operation — that is, it does not build an editable list of extracted text and metadata. Of course, you could paste each item into a text editor or an HTML editor and then work from that file, but CintaNotes provides greater editing functionality within the mini-database of extracts that you build, including selective display by tags.
  • QuoteURLText does not support tagging.
  • The exported notes do not contain specific markup for all fields, but it is possible to apply a script that would add markup and rearrange content based on field labels, syntax of strings, distinctive punctuation, and line-ends. (I use Perl scripts for such purposes.)
  • Author information is not captured automatically.
  • QuoteURLText is available only as an add-on for Mozilla FireFox, as far as I can tell. (But that’s my favorite browser.)

And this is my favorite quote-and-metadata capture tool at the moment. CintaNotes offers a wider range of features, but the free version will not capture links or formatting data in the captured quotes.

© Copyright 2017 Philip C. Murray

 

This entry was posted in tools and applications. Bookmark the permalink.

Leave a Reply

Your email address will not be published. Required fields are marked *