File Name: link_USapp_appnum_CNpub_forpri

Observations: 286,158

(Note links obtained for Chinese patents published 2005-2010).

 

 

Description: This file determines links between file CNpub_for_priority (restricted to years 2005-2010) and USapp_basedat.  Specifically, we begin with the published Chinese patents making a foreign priority claim based on a US patent application, and we match this claim to the patents in the set USapp_basedat.  Note that in about a third of cases the priority claim made when filing a Chinese patent application is based on the provisional application number in the US, that ends up being different from the application number eventually assigned to the patent.  Therefore, we use the provisional application number of each US patent application, as well as the actual application number, in matching.

                         Specifically, we first match on the actual application number and the exact foreign priority date.  This is a type ‘a1’ match. Next we match on the provisional application number and the exact foreign priority date.  This is a type ‘a2’ match.  Finally, we also consider matches on the foreign priority year and the application number/provisional application number, or the title and first inventor.  These are match_type=’2a’ or ‘2b’ below.  These account for a negligible share of all matches (less than .5 percent).

                         We match observations CNpub_for_priority published 2005-2010 at a rate of 95.4 percent.  The table at the bottom provides the distribution of the various ways we can match.  In about two thirds of cases, there is a match on the application publication number and the exact date.  In a third of the cases, the match is on the provisional application number and the exact date. 

                         We don’t include any matches for Chinese patents published in earlier years, because we have not collected reliable information before 2005 for the provisional application numbers on the published US applications.  (See the discussion for the file USapp_provisional.)  Note in the paper we restrict attention to Chinese patents published 2005-2010, where our coverage is very good.

 

Variables

 

Variable

Type

Len

Columns in Ascii File

Description

app_numfix

Char

12

1-12

link variable to Chinese patent (unique index of CNpub_basedat)

app_publication_num

Char

20

13-32

link to US application (along with app_publication_kind).  app_publication_num*application_kind  is unique index of USapp_basedate

app_publication_kind

Char

3

33-35

see above

match_type

Char

2

36-37

=’a1’ if match to application number and exact foreign priority date

=’a2’ if no match above, and match to provisional application number and exact foreign priority date

=’b1’ if no match above, and match to title (40 characters), and either the first inventor name or the application number and the foreign priority year.

=’b2’ if no match above, and match to title (40 characters), and either the first inventor name or the provisional application number and the foreign priority year.

 

Table Showing Distribution of Match Type by Year of Chinese Patent Publication

(We start with the 1,080,629 foreign priority claims in CNpub_for_priority and select out 217,879 cases where the foreign priority claim is from the US and the publication year is 2005 or after.  We next select the 168,854 unique values of app_numfix.  For each app_numfix, we ask if the given app_numfix has at least one match in USapp_basedat and the table below reports the results.

 

 

Counts Of Observations by Match Type

and Year of Chinese Patent Publication

Percent Distribution by Match Type

and Year of Chinese Patent Publication

 

 

 

Not Matched

 

 

a1

 

 

a2

 

 

b1

 

 

b2

Not Matched

a1

a2

b1

b2

 

All

 

By Year

7,759

103,630

56,498

915

52

4.6

61.37

33.46

0.54

0.03

 

2005

2,247

14,437

6,363

165

9

9.68

62.17

27.40

0.71

0.04

 

2006

1,389

15,517

7,026

184

8

5.76

64.32

29.12

0.76

0.03

 

2007

1,201

18,296

9,355

122

4

4.14

63.14

32.28

0.42

0.01

 

2008

1,017

17,044

9,224

117

7

3.71

62.18

33.65

0.43

0.03

 

2009

1,100

20,708

12,819

166

8

3.16

59.50

36.84

0.48

0.02

 

2010

805

17,628

11,711

161

16

2.65

58.14

38.62

0.53

0.05