微软Academic Knowledge API中文翻译

参加比赛需要看这份文档,然而没有中文翻译 = = 只能自力更生丰衣足食了

概述 Overview

Welcome to the Academic Knowledge API. With this service, you will be able to interpret user queries for academic intent and retrieve rich information from the Microsoft Academic Graph (MAG). The MAG knowledge base is a web-scale heterogeneous entity graph comprised of entities that model scholarly activities: field of study, author, institution, paper, venue, and event.

欢迎来看微软学术知识API。我们这个MAG可以帮你查询学术相关请求的详细资料。MAG由一个很牛逼的由实体构成的复杂网络图,这个实体就是学术相关信息,比如领域、作者、论文等等等等。

The MAG data is mined from the Bing web index as well as an in-house knowledge base from Bing. As a result of on-going Bing indexing, this API will contain fresh information from the Web following discovery and indexing by Bing. Based on this dataset, the Academic Knowledge APIs enables a knowledge-driven, interactive dialog that seamlessly combines reactive search with proactive suggestion experiences, rich research paper graph search results, and histogram distributions of the attribute values for a set of papers and related entities.

For more information on the Microsoft Academic Graph, see http://aka.ms/academicgraph.

MAG的数据是从Bing挖来的,数据都很新鲜。这让我们可以提供一个知识驱动,交互式的查询方式,把一堆很牛逼的技术无缝结合起来。

优点 Features

The Academic Knowledge API consists of three related REST endpoints:

  1. interpret – Interprets a natural language user query string. Returns annotated interpretations to enable rich search-box auto-completion experiences that anticipate what the user is typing.
  1. evaluate – Evaluates a query expression and returns Academic Knowledge entity results.
  2. calchistogram – Calculates a histogram of the distribution of attribute values for the academic entities returned by a query expression, such as the distribution of citations by year for a given author.

MAG的API由三个REST接口组成

  1. interpret - 自动补全对话搜索单词
  2. evaluate - 查刚才说的那些实体
  3. calchistogram - 获取查询结果的标签分布直方图(这单词是微软造的吗)

Used together, these API methods allow you to create a rich semantic search experience. Given a user query string, the interpret method provides you with an annotated version of the query and a structured query expression, while optionally completing the user’s query based on the semantics of the underlying academic data. For example, if a user types the string latent s, the interpret method can provide a set of ranked interpretations, suggesting that the user might be searching for the field of study latent semantic analysis, the paper latent structure analysis, or other entity expressions starting with latent s. This information can be used to quickly guide the user to the desired search results.

把这些结合起来,MAG能让你很酸爽。(对,其实都是想表达这个)

The evaluate method can be used to retrieve a set of matching paper entities from the academic knowledge base, and the calchistogram method can be used to calculate the distribution of attribute values for a set of paper entities which can be used to further filter the search results.

evaluate可以用来搜论文,calchistogram可以用来过滤搜索结果。

Interpret Method

The interpret REST API takes an end user query string (i.e., a query entered by a user of your application) and returns formatted interpretations of user intent based on the Academic Graph data and the Academic Grammar.

To provide an interactive experience, you can call this method repeatedly after each character entered by the user. In that case, you should set the complete parameter to 1 to enable auto-complete suggestions. If your application does not need auto-completion, you should set the complete parameter to 0.

REST endpoint:

https://api.projectoxford.ai/academic/v1.0/interpret?

这个API可以猜测用户意图,另外把auto-complete打开还能补全单词。

Request Parameters

  • query
    • Type : Text string
    • Required
    • Query entered by user. If complete is set to 1, query will be interpreted as a prefix for generating query auto-completion suggestions
    • 就是用户输入的东西
  • model
    • Type : Text string
    • default : latest
    • Name of the model that you wish to query.
    • 用户想查的model(model是什么???)
  • complete
    • Type : 0 or 1
    • default : 0
    • 1 means that auto-completion suggestions are generated based on the grammar and graph data.
    • 自动补全功能
  • count
    • Type : Number
    • default : 10
    • Maximum number of interpretations to return.
    • 类似SQL的limit
  • offset
    • Type : Number
    • default : 0
    • Index of the first interpretation to return.
    • 类似SQL的offset
  • timeout
    • Type : Number
    • default : 1000
    • Timeout in milliseconds. Only interpretations found before the timeout has elapsed are returned.
    • 超时限定

Response (JSON)

  • query
    • The query parameter from the request.
    • 你请求的query
  • interpretations
    • An array of 0 or more different ways of matching user input against the grammar.
    • 匹配的结果
  • interpretations[x].logprob
    • The relative natural log probability of the interpretation. Larger values are more likely.
    • 相关度,越大越相关
  • interpretations[x].parse
    • An XML string that shows how each part of the query was interpreted.
    • 解释下query怎么被分解的
  • interpretations[x].rules
    • An array of 1 or more rules defined in the grammar that were invoked during interpretation. For the Academic Knowledge API, there will always be 1 rule.
    • 匹配的规则(MAG只匹配一个规则?)
  • interpretations[x].rules[y].name
    • Name of the rule.
    • 规则名称
  • interpretations[x].rules[y].output
    • Output of the rule.
    • 规则输出结果
  • interpretations[x].rules[y].output.type
    • The data type of the output of the rule. For the Academic Knowledge API, this will always be "query".
    • 输出结果的类型,然而永远都是query类型
  • interpretations[x].rules[y].output.value
    • The output of the rule. For the Academic Knowledge API, this is a query expression string that can be passed to the evaluate and calchistogram methods.
    • 输出的结果,可以直接用到evaluate和calchistogram接口中
  • aborted
    • True if the request timed out.
    • 是否超时了

Example

https://api.projectoxford.ai/academic/v1.0/interpret?query=papers by jaime&complete=1&count=2

The response below contains the top two (because of the parameter count=2) most likely interpretations that complete the partial user input papers by jaime: papers by jaime teevan and papers by jaime green. The service generated query completions instead of considering only exact matches for the author jaime because the request specified complete=1. Note that the canonical value j l green matched via the synonym jamie green, as indicated in the parse.

举例演示下查询两个最相似的结果,这东西还有同义词转换的黑科技。

{
  "query": "papers by jaime",
  "interpretations": [
    {
      "logprob": -12.728,
      "parse": "<rule name=\"#GetPapers\">papers by <attr name=\"academic#AA.AuN\">jaime teevan</attr></rule>",
      "rules": [
        {
          "name": "#GetPapers",
          "output": {
            "type": "query",
            "value": "Composite(AA.AuN=='jaime teevan')"
          }
        }
      ]
    },
    {
      "logprob": -12.774,
      "parse": "<rule name=\"#GetPapers\">papers by <attr name=\"academic#AA.AuN\" canonical=\"j l green\">jaime green</attr></rule>",
      "rules": [
        {
          "name": "#GetPapers",
          "output": {
            "type": "query",
            "value": "Composite(AA.AuN=='j l green')"
          }
        }
      ]
    }
  ]
}

To retrieve entity results for an interpretation, use output.value from the interpret API, and pass that into the evaluate API via the expr parameter. In this example, the query for the first interpretation is:

evaluate?expr=Composite(AA.AuN=='jaime teevan')

又特别强调了一下output.value可以直接给evaluate API用。

Evaluate Method

The evaluate REST API is used to return a set of academic entities based on a query expression.

https://api.projectoxford.ai/academic/v1.0/evaluate?

evaluate REST API 用来查论文。

Request Parameters

  • expr
    • Type : Text string
    • Required
    • A query expression that specifies which entities should be returned.
    • 就是用户输入的东西
  • model
    • Type : Text string
    • default : latest
    • Name of the model that you wish to query.
    • 用户想查的model(model是什么???)
  • attributes
    • Type : Text string
    • default : Id
    • A comma-delimited list that specifies the attribute values that are included in the response. Attribute names are case-sensitive.
    • 想查哪些属性就写在这里
  • count
    • Type : Number
    • default : 10
    • Number of results to return.
    • 类似SQL的limit
  • offset
    • Type : Number
    • default : 0
    • Index of the first interpretation to return.
    • 类似SQL的offset
  • orderby
    • Type : Text string
    • default : by decreasing prob
    • Name of an attribute that is used for sorting the entities. Optionally, ascending/descending can be specified. The format is: name:asc or name:desc.
    • 类似SQL的order by

Response (JSON)

  • expr
    • The expr parameter from the request.
    • 你请求的expr
  • entities
    • An array of 0 or more entities that matched the query expression. Each entity contains a natural log probability value and the values of other requested attributes.
    • 匹配的结果
  • aborted
    • True if the request timed out.
    • 是否超时

Example

https://api.projectoxford.ai/academic/v1.0/evaluate?expr= Composite(AA.AuN=='jaime teevan')&count=2&attributes=Ti,Y,CC,AA.AuN,AA.AuId

Typically, an expression will be obtained from a response to the interpret method. But you can also compose query expressions yourself (see Query Expression Syntax).

Using the count and offset parameters, a large number of results may be obtained without sending a single request that results in a huge (and potentially slow) response. In this example, the request used the expression for the first interpretation from the interpret API response as the expr value. The count=2 parameter specifies that 2 entity results are being requested. And the attributes=Ti,Y,CC,AA.AuN,AA.AuId parameter indicates that the title, year, citation count, author name, and author ID are requested for each result. See Entity Attributes for a list of attributes.

又说了下表达式可以从interpret method来,然后解释了下查询结果怎么来的。例子详细展示了entities的结构,可以参考下。

{
  "expr": "Composite(AA.AuN=='jaime teevan')",
  "entities": 
  [
    {
      "logprob": -15.08,
      "Ti": "personalizing search via automated analysis of interests and activities",
      "Y": 2005,
      "CC": 372,
      "AA": [
        {
          "AuN": "jaime teevan",
          "AuId": 1968481722
        },
        {
          "AuN": "susan t dumais",
          "AuId": 676500258
        },
        {
          "AuN": "eric horvitz",
          "AuId": 1470530979
        }
      ]
    },
    {
      "logprob": -15.389,
      "Ti": "the perfect search engine is not enough a study of orienteering behavior in directed search",
      "Y": 2004,
      "CC": 237,
      "AA": [
        {
          "AuN": "jaime teevan",
          "AuId": 1982462162
        },
        {
          "AuN": "christine alvarado",
          "AuId": 2163512453
        },
        {
          "AuN": "mark s ackerman",
          "AuId": 2055132526
        },
        {
          "AuN": "david r karger",
          "AuId": 2012534293
        }
      ]
    }
  ]
}

CalcHistogram Method

The calchistogram REST API is used to calculate the distribution of attribute values for a set of paper entities.

calchistogram REST API 用于计算查询论文标签分布的直方图。

https://api.projectoxford.ai/academic/v1.0/calchistogram?

Request Parameters

  • expr
    • Type : Text string
    • Required
    • A query expression that specifies the entities over which to calculate histograms.
    • 就是用户输入的东西
  • model
    • Type : Text string
    • default : latest
    • Name of the model that you wish to query.
    • 用户想查的model(model是什么???)
  • attributes
    • Type : Text string
    • A comma-delimited list that specifies the attribute values that are included in the response. Attribute names are case-sensitive.
    • 想查哪些属性就写在这里
  • count
    • Type : Number
    • default : 10
    • Number of results to return.
    • 类似SQL的limit
  • offset
    • Type : Number
    • default : 0
    • Index of the first interpretation to return.
    • 类似SQL的offset

Response (JSON)

  • expr
    • The expr parameter from the request.
    • 你请求的expr
  • num_entities
    • Total number of matching entities.
    • 匹配结果的数量
  • histograms
    • An array of histograms, one for each attribute specified in the request.
    • 直方图信息,写了多少属性就有多少直方图
  • histograms[x].attribute
    • Name of the attribute over which the histogram was computed.
    • 属性名
  • histograms[x].distinct_values
    • Number of distinct values among matching entities for this attribute.
    • 匹配到结果的数量(非重复)
  • histograms[x].total_count
    • Total number of value instances among matching entities for this attribute.
    • 匹配到结果的数量(有重复)
  • histograms[x].histogram
    • Histogram data for this attribute.
    • 直方图信息
  • histograms[x].histogram[y].value
    • A value for the attribute.
    • 直方图每个属性的值
  • histograms[x].histogram[y].logprob
    • Total natural log probability of matching entities with this attribute value.
    • 直方图每个属性匹配程度
  • histograms[x].histogram[y].count
    • Number of matching entities with this attribute value.
    • 直方图每个属性的匹配数量
  • aborted
    • True if the request timed out.
    • 是否超时

Example

https://api.projectoxford.ai/academic/v1.0/calchistogram?expr=And(Composite(AA.AuN=='jaime teevan'),Y>2012)&attributes=Y,F.FN&count=4

In this example, in order to generate a histogram of the count of publications by year for a particular author since 2010, we can first generate the query expression using the interpret API with query string: papers by jaime teevan after 2012.

https://api.projectoxford.ai/academic/v1.0/interpret?query=papers by jaime teevan after 2012

The expression in the first interpretation that is returned from the interpret API is And(Composite(AA.AuN=='jaime teevan'),Y>2012).

This expression value is then passed in to the calchistogram API. The attributes=Y,F.FN parameter indicates that the distributions of paper counts should be by Year and Field of Study, e.g.:

https://api.projectoxford.ai/academic/v1.0/calchistogram?expr=And(Composite(AA.AuN=='jaime teevan'),Y>2012)&attributes=Y,F.FN&count=4

查作者2012年后每年发的论文数量,又顺便强调了下表达式可以从interpret API获得,还贴心的举了个例子表示如何通过interpret API获得(= =||那么担心我之前没看到吗)

The response to this request first indicates that there are 23 papers that match the query expression. For the Year attribute, there are 3 distinct values, one for each year after 2012 (i.e. 2013, 2014, and 2015) as specified in the query. The total paper count over the 3 distinct values is 37. For each Year, the histogram shows the value, total natural log probability, and count of matching entities.

The histogram for Field of Study shows that there are 34 distinct fields of study. As a paper may be associated with multiple fields of study, the total count (53) can be larger than the number of matching entities. Although there are 34 distinct values, the response only includes the top 4 because of the count=4 parameter.

解释了下面的查询结果怎么来的,对照查询请求看一下就懂。唯一有个问题是:“它怎么看出来有23篇论文匹配这个表达式的?说好的37篇呢???”

{
  "expr": "And(Composite(AA.AuN=='jaime teevan'),Y>2012)",
  "num_entities": 37,
  "histograms": [
    {
      "attribute": "Y",
      "distinct_values": 3,
      "total_count": 37,
      "histogram": [
        {
          "value": 2014,
          "logprob": -15.753,
          "count": 15
        },
        {
          "value": 2013,
          "logprob": -15.805,
          "count": 12
        },
        {
          "value": 2015,
          "logprob": -16.035,
          "count": 10
        }
      ]
    },
    {
      "attribute": "F.FN",
      "distinct_values": 34,
      "total_count": 53,
      "histogram": [
        {
          "value": "crowdsourcing",
          "logprob": -15.258,
          "count": 9
        },
        {
          "value": "information retrieval",
          "logprob": -16.002,
          "count": 4
        },
        {
          "value": "personalization",
          "logprob": -16.226,
          "count": 3
        },
        {
          "value": "mobile search",
          "logprob": -17.228,
          "count": 2
        }
      ]
    }
  ]
}