21 Commits

Author SHA1 Message Date
ldy
b5ce290ea5 Optimization:
1. add conference special issue papers
2. optimized counting process
3. enhanced saving robustness
2023-08-20 16:48:43 +08:00
ldy
ba3671b5fd Optimization:
1. add special issue papers
2. optimized doi regular expression
Bug fix:
1. multiple email problem
Reminder:
1. author's email data structure changed due to multiple email problem
2023-08-19 21:15:12 +08:00
XCX
88bcbf5b8f Changed the data structure 2023-08-18 19:43:22 +08:00
XCX
1d79556c42 Change the data structure 2023-08-13 21:38:48 +08:00
XCX
1602d03e9d Change the data structure 2023-08-13 21:36:22 +08:00
XCX
27707a058c Change the data structure 2023-08-13 21:29:10 +08:00
ldy
083e6c87eb Optimization:
strip "\newline" in author name
2023-08-11 20:45:04 +08:00
ldy
ed469ee362 Bug Fix:
1. reformat regular expressions for keyword matching
2023-08-11 19:52:40 +08:00
ldy
b1eba69085 Bug Fix:
1. hr_count soup should be article_soup
2023-08-11 19:16:03 +08:00
ldy
68a755a633 Bug Fix:
1. added split author data when hits "\n"
2. added split name by "."
3. added method extracting author info when have 3 hr tag
2023-08-11 19:13:33 +08:00
ldy
f97195c94d Bug Fix:
handled exception when the volume website has no title
2023-08-11 18:05:15 +08:00
ldy
3e78e9f48e Optimization:
1. added new regular expression format for volume
2. added new strip method for msc
3. deleted blank-space author
4. optimized middle name strip method
5. added new matching pattern for no table author list
6. added exception storing for AUTHOR SEARCHING ERROR
Bug fix:
1. error record saving
2023-08-11 14:26:59 +08:00
ldy
69b10a9f72 Merge remote-tracking branch 'origin/main' 2023-08-11 12:44:53 +08:00
XCX
e504e73409 Changed the file path of saving data 2023-08-11 12:22:48 +08:00
ldy
35f5f2ac5e Optimization:
clustered error files into a folder
2023-08-11 11:42:02 +08:00
ldy
7726650eaa Bug fixed:
ignored blank-space elements in the middle name list
2023-08-10 13:40:26 +08:00
ldy
71e613d994 Optimization:
less memory usage
data collection for volume HTML format error
added time elapse monitor
2023-08-10 12:57:28 +08:00
ldy
2c25682f81 Bug Fix:
1. unworkable retrying function back online baby
New Function:
1. reformatted datetime_transform funtion to handle more month typos
2. reformatted process_article function into 3 functions to use multi-threads better running time
3. renewed article url search technique to handle different volume websites
4. more exception handling
5. bettered keywords and affiliation strip method
6. added methods for processing author data when there exists no author table
7. added code for retry failed processing paper
8. more detailed error messages storage
2023-08-10 01:15:17 +08:00
XCX
9ee9bc4462 Replace the code for merging data 2023-08-08 22:57:29 +08:00
ldy
49746b779b handled 2 typos in month while formatting date 2023-08-08 13:24:51 +08:00
XCX
1e98615778 A new code for same web data merge00_File_merge 2023-08-06 19:42:43 +08:00