<?xml version="1.0" encoding="UTF-8"?><rss
version="2.0"
xmlns:content="http://purl.org/rss/1.0/modules/content/"
xmlns:wfw="http://wellformedweb.org/CommentAPI/"
xmlns:dc="http://purl.org/dc/elements/1.1/"
xmlns:atom="http://www.w3.org/2005/Atom"
xmlns:sy="http://purl.org/rss/1.0/modules/syndication/"
xmlns:slash="http://purl.org/rss/1.0/modules/slash/"
><channel><title>introduction Scientometrics Archives - Francesco Lelli %</title> <atom:link href="https://francescolelli.info/tag/introduction-scientometrics/feed/" rel="self" type="application/rss+xml" /><link>https://francescolelli.info/tag/introduction-scientometrics/</link> <description>Information Management, Computer Science,  Economics, Finance and more</description> <lastBuildDate>Sun, 24 Nov 2019 11:25:09 +0000</lastBuildDate> <language>en-US</language> <sy:updatePeriod> hourly </sy:updatePeriod> <sy:updateFrequency> 1 </sy:updateFrequency> <generator>https://wordpress.org/?v=6.8.5</generator><image> <url>https://francescolelli.info/wp-content/uploads/2018/11/cropped-InstrumentElement-32x32.jpg</url><title>introduction Scientometrics Archives - Francesco Lelli %</title><link>https://francescolelli.info/tag/introduction-scientometrics/</link> <width>32</width> <height>32</height> </image> <site
xmlns="com-wordpress:feed-additions:1">156264324</site> <item><title>Scientometrics and Scientific Citations in Patent a database</title><link>https://francescolelli.info/generic/scientometrics-and-scientific-citations-in-patent-a-database/</link> <comments>https://francescolelli.info/generic/scientometrics-and-scientific-citations-in-patent-a-database/#respond</comments> <dc:creator><![CDATA[Francesco Lelli]]></dc:creator> <pubDate>Tue, 08 Jan 2019 15:18:22 +0000</pubDate> <category><![CDATA[Generic]]></category> <category><![CDATA[introduction Scientometrics]]></category> <category><![CDATA[patent database]]></category> <category><![CDATA[PATSTAT]]></category> <category><![CDATA[scientometrics]]></category> <guid
isPermaLink="false">https://francescolelli.info/?p=360</guid><description><![CDATA[<p>Scientometrics is a science that provides quantitative measures for evaluation of scientific output through analysis of bibliographic information. It is most commonly used in studies on impact and reach of scientific works. In economics its most prominent use can be found in policy evaluation and research on innovation. Such studies make use of the fact [&#8230;]</p><p>The post <a
href="https://francescolelli.info/generic/scientometrics-and-scientific-citations-in-patent-a-database/">Scientometrics and Scientific Citations in Patent a database</a> appeared first on <a
href="https://francescolelli.info">Francesco Lelli</a>.</p> ]]></description> <content:encoded><![CDATA[<p>Scientometrics is a science that provides quantitative measures for evaluation of scientific output through analysis of bibliographic information. It is most commonly used in studies on impact and reach of scientific works. In economics its most prominent use can be found in policy evaluation and research on innovation. Such studies make use of the fact that bibliographic data is often linked to other economic phenomena. One example of such a relation are citations of scientific publications in patent publications. Figure 1 shows an excerpt from a patent publication.</p><figure
class="wp-block-image"><img
fetchpriority="high" decoding="async" width="814" height="550" data-attachment-id="361" data-permalink="https://francescolelli.info/generic/scientometrics-and-scientific-citations-in-patent-a-database/attachment/scientific-citation-in-a-patent-publication/" data-orig-file="https://francescolelli.info/wp-content/uploads/2019/01/Scientific-Citation-in-a-patent-publication.png" data-orig-size="814,550" data-comments-opened="1" data-image-meta="{&quot;aperture&quot;:&quot;0&quot;,&quot;credit&quot;:&quot;&quot;,&quot;camera&quot;:&quot;&quot;,&quot;caption&quot;:&quot;&quot;,&quot;created_timestamp&quot;:&quot;0&quot;,&quot;copyright&quot;:&quot;&quot;,&quot;focal_length&quot;:&quot;0&quot;,&quot;iso&quot;:&quot;0&quot;,&quot;shutter_speed&quot;:&quot;0&quot;,&quot;title&quot;:&quot;&quot;,&quot;orientation&quot;:&quot;0&quot;}" data-image-title="Scientific Citation in a patent publication" data-image-description="&lt;p&gt;Fig 1. scientific citations in a patent publication do not allow scientometric measures&lt;/p&gt;
" data-image-caption="&lt;p&gt;Fig 1. scientific citations in a patent publication do not allow scientometric measures&lt;/p&gt;
" data-medium-file="https://francescolelli.info/wp-content/uploads/2019/01/Scientific-Citation-in-a-patent-publication-300x203.png" data-large-file="https://francescolelli.info/wp-content/uploads/2019/01/Scientific-Citation-in-a-patent-publication.png" src="https://francescolelli.info/wp-content/uploads/2019/01/Scientific-Citation-in-a-patent-publication.png?8011c3&amp;8011c3" alt="Fig 1. scientific citations in a patent publication do not allow scientometric measures" class="wp-image-361" srcset="https://francescolelli.info/wp-content/uploads/2019/01/Scientific-Citation-in-a-patent-publication.png 814w, https://francescolelli.info/wp-content/uploads/2019/01/Scientific-Citation-in-a-patent-publication-300x203.png 300w, https://francescolelli.info/wp-content/uploads/2019/01/Scientific-Citation-in-a-patent-publication-768x519.png 768w, https://francescolelli.info/wp-content/uploads/2019/01/Scientific-Citation-in-a-patent-publication-600x405.png 600w" sizes="(max-width: 814px) 100vw, 814px" /><figcaption>Fig 1. scientific citations in a patent publication do not allow scientometric measures</figcaption></figure><p> Scientometrics is a way to measure how science, technology, and economy are intrinsically connected with each other. Understanding how undeniably constitutes valuable economic knowledge. However, in order to understand the connections between those phenomena a suitable methodological environment needs to be created.</p><p>The following example represents a classical use case of investigation <br>Scientometrics</p><p>PATSTAT is a product of the European
Patent Office (EPO, <a
href="https://www.epo.org">https://www.epo.org</a>).
It is a periodical snapshot of patent related information organized in a
relational database model. Records on patent applications, their applicants and
publications are available. A table with a code name “tls214_npl_publn” (often
referred to as TLS214) stores information on bibliographic references like the
one shown in Figure 1. This table contains more than 30 million records. The
records, however, are often duplicated or inaccurate. Moreover, a full
bibliographic reference is stored in only one attribute. This makes it
problematic to query the table for relevant information, for example, to
retrieve an author’s name or the date of a specific publication.</p><p><em>Disambiguation</em> in the context of data management refers to <em>the identification of unique entities within a dataset</em>. Such entities are identified by a unique identifier that can be assigned to many database records, which effectively describe the same bibliographic entity. The problem of data duplication and ambiguity arises due to (among other reasons):</p><ol
class="wp-block-list"><li>Lack of consistent input (transcription) convention;</li><li>Variable level of input (detail) accuracy;</li><li>Missing data;</li><li>Different order of transcription of the same information;</li><li>Typos.</li></ol><p>The following table excerpt illustrates the problem:</p><figure
class="wp-block-image"><img
decoding="async" width="887" height="361" data-attachment-id="362" data-permalink="https://francescolelli.info/generic/scientometrics-and-scientific-citations-in-patent-a-database/attachment/problem-in-defining-a-unique-record-for-a-unique-reference/" data-orig-file="https://francescolelli.info/wp-content/uploads/2019/01/Problem-in-Defining-a-unique-record-for-a-unique-reference.png" data-orig-size="887,361" data-comments-opened="1" data-image-meta="{&quot;aperture&quot;:&quot;0&quot;,&quot;credit&quot;:&quot;&quot;,&quot;camera&quot;:&quot;&quot;,&quot;caption&quot;:&quot;&quot;,&quot;created_timestamp&quot;:&quot;0&quot;,&quot;copyright&quot;:&quot;&quot;,&quot;focal_length&quot;:&quot;0&quot;,&quot;iso&quot;:&quot;0&quot;,&quot;shutter_speed&quot;:&quot;0&quot;,&quot;title&quot;:&quot;&quot;,&quot;orientation&quot;:&quot;0&quot;}" data-image-title="Problem in Defining a unique record for a unique reference" data-image-description="" data-image-caption="" data-medium-file="https://francescolelli.info/wp-content/uploads/2019/01/Problem-in-Defining-a-unique-record-for-a-unique-reference-300x122.png" data-large-file="https://francescolelli.info/wp-content/uploads/2019/01/Problem-in-Defining-a-unique-record-for-a-unique-reference.png" src="https://francescolelli.info/wp-content/uploads/2019/01/Problem-in-Defining-a-unique-record-for-a-unique-reference.png?8011c3&amp;8011c3" alt="Fig 2. Example of 18 out of 56 records found by a simple search on the exact title match. Thus, even more records referring to the same entity may exist in the database." class="wp-image-362" srcset="https://francescolelli.info/wp-content/uploads/2019/01/Problem-in-Defining-a-unique-record-for-a-unique-reference.png 887w, https://francescolelli.info/wp-content/uploads/2019/01/Problem-in-Defining-a-unique-record-for-a-unique-reference-300x122.png 300w, https://francescolelli.info/wp-content/uploads/2019/01/Problem-in-Defining-a-unique-record-for-a-unique-reference-768x313.png 768w, https://francescolelli.info/wp-content/uploads/2019/01/Problem-in-Defining-a-unique-record-for-a-unique-reference-600x244.png 600w" sizes="(max-width: 887px) 100vw, 887px" /><figcaption> Example of 18 out of 56 records found by a simple search on the exact title match. Thus, even more records referring to the same entity may exist in the database.<br></figcaption></figure><p>All of the records shown above refer
to the same entity – the paper by E.F. Codd on the relational database model&nbsp;(Codd, 1970). However, the
references to the same entity are given in different ways or are simply
duplicated. For example, record 7 does not contain Codd’s name initial or month
information, while record 8 contains full transcription of the abbreviated name
“ACM” at the end of the string. All the records describe the same bibliographic
entity but are treated as distinct entities by the primary key of the TLS214
relation – the “npl_publn_id” attribute.</p><p>Such a design makes it very difficult to use information in the table in a correct way. For example, say a researcher is interested in the relation between science and technology. She assesses that a scientific discovery is well proxied by a publication of a scientific paper, while a piece of technology can be modelled as a patent publication. However, due to the unsupervised procedure in which citations are added to the PATSTAT database it is difficult for her to specify a query that takes into account all the possible variation in records that describe the same bibliographic entity. As a result, the researcher is unable to properly count all of the scientific references to the same bibliographic entity. This results in incorrect patent statistics. As much as it may be possible to identify a single researcher and his body of work (like E.F. Codd), studies on a population of researchers are impossible without a prior cleansing and de-duplication of the PATSTAT database.</p><h2 class="wp-block-heading"> Scientometrics: <strong>Problem of disambiguating scientific references</strong></h2><p>The objective of an automated method for cleaning is the disambiguation of all scientific references in the original table of the PATSTAT database proposing techniques for advancing the <br>Scientometrics  research field. “Scientific references” refer to the types of records in the table that describe entities that can be classified as publications. Notice that not all records in the table are scientific references, but can also be references to other patents. The final result of the procedure is a table with <em>clusters of name variants</em> (records) for each, unique scientific entity. The next table presents an example cluster with label 231 for the paper by E.F. Codd.</p><table
class="wp-block-table is-style-stripes"><tbody><tr><td> <strong>cluster_id</strong></td><td> <strong>npl_publn_id</strong></td><td> <strong>npl_biblio</strong></td></tr><tr><td> <strong>231</strong></td><td> 2219025</td><td> Codd, E.
F., A Relational Model of Data for Large Shared Data Banks, Communications of
the Association for Computing Machinery, Association for Computing Machinery
13: Jun. 6, 1970, pp. 377-387, XP002219025.</td></tr><tr><td> <strong>231</strong></td><td> 950805382</td><td> CODD,
E.F.: A Relational Model of Data for Large Shared Data Banks. In: Comm. of
the ACM, Vol. 13, Nr. 6, Juni 1970, S. 377-387</td></tr><tr><td> <strong>231</strong></td><td> 953756074</td><td> Codd,
E.F., A Relational Model of Data for Large Shared Data Banks, Communications
of the ACM, 13(6):377-387 (1970).</td></tr><tr><td> <strong>231</strong></td><td> 955210884</td><td> E. Codd,
A Relational Model of Data for Large Shared Data Banks, Communications of the
ACM,vol. 13, No. 6, Jun. 1970, pp. 377-387.</td></tr><tr><td> <strong>231</strong></td><td> 955405309</td><td> Codd,
E.F., A Relational Model of Data for Large Shared Data Banks, Jun. 1970,
Communications of the ACM, vol. 13, No. 6, pp. 377-387.</td></tr><tr><td> <strong>…</strong></td><td> …</td><td> …</td></tr><tr><td> <strong>…</strong></td><td> …</td><td> …</td></tr></tbody></table><p>This introduction should serve as framework for understanding the following thesis proposal in the area of  scientometrics: <a
href="https://francescolelli.info/thesis/human-enhanced-machine-driven-categorization-thesis-proposal/">Human Enhanced Machine Driven-Categorization</a></p><p></p><p>The post <a
href="https://francescolelli.info/generic/scientometrics-and-scientific-citations-in-patent-a-database/">Scientometrics and Scientific Citations in Patent a database</a> appeared first on <a
href="https://francescolelli.info">Francesco Lelli</a>.</p> ]]></content:encoded> <wfw:commentRss>https://francescolelli.info/generic/scientometrics-and-scientific-citations-in-patent-a-database/feed/</wfw:commentRss> <slash:comments>0</slash:comments> <post-id
xmlns="com-wordpress:feed-additions:1">360</post-id> </item> </channel> </rss>