Details

    • Type: Story Story
    • Status: Closed Closed
    • Priority: Minor Minor
    • Resolution: Fixed
    • Affects Version/s: None
    • Fix Version/s: GSearch 2.4
    • Component/s: GSearch
    • Labels:
      None

      Description

      This issue is created in response to the messages below. I want to explore complex GSearch use cases and sketch or implement solutions, based on existing and/or potential GSearch functionality. Such functionality includes many-repositories-to-many-indexes, indexing xslt stylesheets creating index documents across Fedora datastreams and/or objects, managing GSearch configurations in Fedora objects (FCREPO-1018), and interaction between the resource index and the Lucene/Solr index(es) (FCREPO-1009).


      From: "ajs6f@virginia.edu" <ajs6f@virginia.edu>
      Date: 24. okt 2011 02.19.24 CEST
      To: Support and info exchange list for Fedora users. <fedora-commons-users@lists.sourceforge.net>
      Subject: Re: [fcrepo-user] [fcrepo-dev] GSearch planning
      Reply-To: Support and info exchange list for Fedora users. <fedora-commons-users@lists.sourceforge.net>

      The intention of bringing the structure of the indexing workflow out of XSLT into the RDF relationships between objects is not primarily to provide for complex cases, although it can do that. It is, instead, to make that structure part of the curation of the objects themselves.

      The interest of this move follows on the claim that the presentation of objects increasingly is dependent on indexing (in part because so many "front-end" frameworks for Fedora rely on indexes to immediately construct many user-facing web pages, and not on direct retrieval from the repository, e.g. Hydra or Islandora), and that therefore indexing workflow deserves to be curated alongside data contents in the _strongest practical way_. I claim further that the strongest possible way to curate relationships between content datastreams and indexing transforms in a Fedora repository is in explicit RDF, and that this is practical.

      I quite agree that a powerful but unwieldy or opaque style of configuration may be worse than a weaker but more transparent style, but I believe that with enough thought and attention for the specific modeling of workflow, we could provide graceful factoring in configuration through which simple GSearch indexing workflows would incur very little expense (and even less than they now do) but sophisticated workflows remain possible.

      ---
      A. Soroka
      Online Library Environment
      the University of Virginia Library




      On Oct 23, 2011, at 8:05 PM, Conal Tuohy wrote:

      On 17/10/11 11:48, ajs6f@virginia.edu wrote:
      Heartily seconded!

      In the architecture we're exploring at UVa, we use RELS-INT to define relationships between datastreams and indexing transforms. The relevance to this issue lies in RELS-EXT. By indexing RELS-EXT as a datastream (and assuming that the molecular "para-object" that is responsible for a given index record is constructed via RELS-EXT relationships) we can obtain information about the other objects that may be involved in any index record to which a given object is associated. I'm in agreement that keeping the analysis of object relationships for indexing purposes in indexing XSLT is _not_ the best way, and instead we look to combine this technique with the use of Enhanced Content Model Views to create the kind of multiobject records to which Jonathan is pointing by hiding the explicit structure of the "para-object" from the indexing XSLT. This may or may not be the best possible solution for the problem, so I'm just offering it as a place to begin discussion.


      ---
      A. Soroka
      Online Library Environment
      the University of Virginia Library




      On Oct 16, 2011, at 8:15 PM, Jonathan Green wrote:

      Something that I think needs to be considered when moving forward with gsearch is that the index may not always share a 1 to 1 relationship with objects in fedora. In a very atomistic content model perhaps the solr document is actually composed of parts from many related objects. These types of decisions are currently very hard to make in XSLTs.
      In what way hard? Can you expand a little on the difficulties you see?

      While I think XSLTs have a place in transforming metadata, there needs to be something more.
      One issue to keep to in mind here is the 80/20 rule. If Fedora's
      indexing system is complex enough to allow for all manner of complex
      cases, then it may be needlessly complex for many simple cases. A more
      complex system would make complex indexing easier, but if it also makes
      simpler cases harder (even just harder to understand a configuration
      system), then the OVERALL ease-of-use might actually decrease. I don't
      think it's possible to strike a perfect balance, but a technology like
      XSLT might be a useful catch-all: it can handle simple cases very
      simply, but can also be extended arbitrarily (including, for instance,
      transcluding metadata from related Fedora objects or other XML datasources).

      In very many cases, the mapping of Fedora objects to Solr documents is
      very simple and won't, for instance, involve any aggregation. But the
      mapping from Fedora objects to Solr documents is in principle arbitrary;
      you might choose to do pretty much anything, quite legitimately. You
      might have metadata schemas of any type; you might use the RDF store,
      you might have external authority files, etc. This is where, I think, a
      system which is sufficiently configurable to be fully general could well
      end up as complex as an XSLT-based system would be, but without many of
      the advantages of XSLT (code libraries, books and mail-lists, programmer
      experience, etc).

      It might be enough to ship Fedora with a basic set of XSLT transforms,
      and a few sample transforms showing how to use the resource index, etc.
      --

      Conal Tuohy
      eResearch Business Analyst
      Victorian eResearch Strategic Initiative
      +61-466324297

        Activity

        Hide
        Gert Schmeltz Pedersen added a comment -
        I committed code on this issue yesterday, see

        https://github.com/fcrepo/gsearch/commit/ad671857380eec7f96ac9d75b1e248445bf7ea6e and

        https://github.com/fcrepo/gsearch/commit/eb06f3c357d475c72abdbe11070f699d84ab1f56

        It addresses the concern that "the index may not always share a 1 to 1 relationship with objects in fedora".

        The test case committed here has both
        - an index with contents from an itql query to risearch, thus combining or joining several Fedora objects,
        - and the new feature that there may be more than one index document per Fedora object, the example here has an index document for each of three datastreams.
        Show
        Gert Schmeltz Pedersen added a comment - I committed code on this issue yesterday, see https://github.com/fcrepo/gsearch/commit/ad671857380eec7f96ac9d75b1e248445bf7ea6e and https://github.com/fcrepo/gsearch/commit/eb06f3c357d475c72abdbe11070f699d84ab1f56 It addresses the concern that "the index may not always share a 1 to 1 relationship with objects in fedora". The test case committed here has both - an index with contents from an itql query to risearch, thus combining or joining several Fedora objects, - and the new feature that there may be more than one index document per Fedora object, the example here has an index document for each of three datastreams.

          People

          • Assignee:
            Gert Schmeltz Pedersen
            Reporter:
            Gert Schmeltz Pedersen
          • Votes:
            0 Vote for this issue
            Watchers:
            2 Start watching this issue

            Dates

            • Created:
              Updated:
              Resolved: