Uploaded image for project: 'DSpace'
  1. DSpace
  2. DS-1008

Solr Statistics markRobotsByIP can mark too many IP addresses, including IP's not on the IP list

    Details

    • Type: Bug
    • Status: Accepted / Claimed (View Workflow)
    • Priority: Minor
    • Resolution: Unresolved
    • Affects Version/s: 1.6.0, 1.6.1, 1.6.2, 1.7.0, 1.7.1, 1.7.2
    • Fix Version/s: None
    • Component/s: Solr
    • Labels:
      None
    • Attachments:
      1
    • Comments:
      3
    • Documentation Status:
      Needed

      Description

      The function markRobotsByIP is including too many bot IP's by a factor of potentially 9.

      https://github.com/DSpace/DSpace/blob/5366d237afa07005ec485831c9bca1f1c992f01d/dspace-stats/src/main/java/org/dspace/statistics/SolrLogger.java#L473
      /* query for ip, exclude results previously set as bots. */
      processor.execute("ip:"ip "* AND -isBot:true");

      ip* would expand:
      10.10.10* to 10.10.[10, 100-109].*
      10.10.10.10* to 10.10.10.[10, 100-109]

      My co-worker Brian Stamper suggested:
      if (ip.matches("[0-9]\.[0-9]\.[0-9]\.[0-9]") {
      // Full 4 octet string, run as-is.
      processor.execute("ip:" + ip + " AND -isBot:true");
      } else if (ip.matches("\.$") {
      // didn't match full-octet, but ends in period, we assume it was something like #.#.#. or #.#. – I don't expect this in the "stock" input from ip-list.com
      processor.execute("ip:" + ip + "* AND -isBot:true");
      } else if (ip.matches("[0-9]$") {
      // ends with a number, and is not a full 4-octet as first entry, so we append .*
      processor.execute("ip:" + ip + ".* AND -isBot:true");
      } else {
      log.error("Unexpected IP value: " + ip);
      }

        Attachments

          Activity

            People

            • Assignee:
              peterdietz Peter Dietz
              Reporter:
              peterdietz Peter Dietz
            • Votes:
              0 Vote for this issue
              Watchers:
              1 Start watching this issue

              Dates

              • Created:
                Updated: