65 Commits

Author SHA1 Message Date
ldy
61ef0081d8 Pushed new "ejde_buffer_transformation.zip" 2023-11-02 04:33:02 +08:00
ldy
ad63bcf6c4 Updated ejde parser format
Fixed duplicate data dumping problem
Pushed new "ejde_buffer.zip"
2023-11-02 04:28:06 +08:00
Chenxiao Xia
50e30e105b New code for transformation. Transform the structure of SprigerOpen data 2023-11-01 20:11:20 +08:00
Chenxiao Xia
7f9ab94adc New code for transformation 2023-11-01 13:13:28 +08:00
Chenxiao Xia
35ea1dd424 Remove error code 2023-11-01 13:12:38 +08:00
Chenxiao Xia
ad6ba8832a Merge branch 'main' of https://code.uic.edu.cn/scholar-data-mining/cst-project 2023-11-01 13:08:28 +08:00
ldy
dd0c4379da Push: ejde transformed data 2023-11-01 13:07:24 +08:00
Chenxiao Xia
c3c460a4dc Fix bugs, new code for transform ejde data structure 2023-11-01 12:57:09 +08:00
Chenxiao Xia
4c2c68feca the data of ejde.com has been transform 2023-11-01 12:37:08 +08:00
Chenxiao Xia
24aa62c8db Fix the error of output, a new copies of data has been upload 2023-11-01 10:28:08 +08:00
Chenxiao Xia
e07617bebc A new code for transforming data structure 2023-10-29 15:21:01 +08:00
Chenxiao Xia
11e326ea76 Fix some bugs 2023-10-29 15:19:40 +08:00
Chenxiao Xia
e37374d1e2 Add a new file include transform data 2023-10-29 15:17:53 +08:00
ldy
8751b55617 updated Aminer security key 2023-10-10 10:25:04 +08:00
Chenxiao Xia
193581cd6a Update code for merging author data by API 2023-10-09 23:24:58 +08:00
Chenxiao Xia
9b33cbabe7 full data of calling aminer API 2023-10-06 13:55:50 +08:00
ldy
123da7e7e8 Update 6 files
- /.idea/inspectionProfiles/Project_Default.xml
- /.idea/inspectionProfiles/profiles_settings.xml
- /.idea/.gitignore
- /.idea/cst-project.iml
- /.idea/modules.xml
- /.idea/vcs.xml
2023-09-26 15:40:17 +00:00
ldy
ef9ab9abb1 Optimized file allocation 2023-09-26 23:37:43 +08:00
ldy
f45c63fa8c New feature:
1. unidecode for EJDE parser
2. get paper citation by aminer api
Bug contains:
1. function "scholarly_get_citation" cannot work properly
2023-09-23 10:43:46 +08:00
Chenxiao Xia
2f6f86a48e Fix bugs and add new code to search author data without email information 2023-09-20 23:29:42 +08:00
Chenxiao Xia
2a1fcfc4cd New function to divide author data according to their last name's first letter 2023-09-16 18:48:06 +08:00
Chenxiao Xia
34fb579f7c Fix bugs 2023-09-16 18:46:52 +08:00
ldy
a2284b7b45 git push for renew personal info 2023-09-12 13:23:41 +08:00
Chenxiao Xia
d374a3766b Update new save code 2023-09-12 09:05:55 +08:00
XCX
c75d2545d7 Change into zip files 2023-08-21 12:19:35 +08:00
XCX
2a249cb6d6 Origin data and merged data 2023-08-21 12:16:08 +08:00
ldy
b5ce290ea5 Optimization:
1. add conference special issue papers
2. optimized counting process
3. enhanced saving robustness
2023-08-20 16:48:43 +08:00
XCX
e217342ce2 Full same web merge code 2023-08-20 00:07:47 +08:00
ldy
ba3671b5fd Optimization:
1. add special issue papers
2. optimized doi regular expression
Bug fix:
1. multiple email problem
Reminder:
1. author's email data structure changed due to multiple email problem
2023-08-19 21:15:12 +08:00
XCX
88bcbf5b8f Changed the data structure 2023-08-18 19:43:22 +08:00
XCX
1d79556c42 Change the data structure 2023-08-13 21:38:48 +08:00
XCX
1602d03e9d Change the data structure 2023-08-13 21:36:22 +08:00
XCX
27707a058c Change the data structure 2023-08-13 21:29:10 +08:00
ldy
083e6c87eb Optimization:
strip "\newline" in author name
2023-08-11 20:45:04 +08:00
ldy
ed469ee362 Bug Fix:
1. reformat regular expressions for keyword matching
2023-08-11 19:52:40 +08:00
ldy
b1eba69085 Bug Fix:
1. hr_count soup should be article_soup
2023-08-11 19:16:03 +08:00
ldy
68a755a633 Bug Fix:
1. added split author data when hits "\n"
2. added split name by "."
3. added method extracting author info when have 3 hr tag
2023-08-11 19:13:33 +08:00
ldy
f97195c94d Bug Fix:
handled exception when the volume website has no title
2023-08-11 18:05:15 +08:00
ldy
3e78e9f48e Optimization:
1. added new regular expression format for volume
2. added new strip method for msc
3. deleted blank-space author
4. optimized middle name strip method
5. added new matching pattern for no table author list
6. added exception storing for AUTHOR SEARCHING ERROR
Bug fix:
1. error record saving
2023-08-11 14:26:59 +08:00
ldy
69b10a9f72 Merge remote-tracking branch 'origin/main' 2023-08-11 12:44:53 +08:00
XCX
e504e73409 Changed the file path of saving data 2023-08-11 12:22:48 +08:00
XCX
8ea31d08f4 Fix the bug of adding duplicate data 2023-08-11 12:19:55 +08:00
ldy
35f5f2ac5e Optimization:
clustered error files into a folder
2023-08-11 11:42:02 +08:00
ldy
7726650eaa Bug fixed:
ignored blank-space elements in the middle name list
2023-08-10 13:40:26 +08:00
ldy
71e613d994 Optimization:
less memory usage
data collection for volume HTML format error
added time elapse monitor
2023-08-10 12:57:28 +08:00
ldy
2c25682f81 Bug Fix:
1. unworkable retrying function back online baby
New Function:
1. reformatted datetime_transform funtion to handle more month typos
2. reformatted process_article function into 3 functions to use multi-threads better running time
3. renewed article url search technique to handle different volume websites
4. more exception handling
5. bettered keywords and affiliation strip method
6. added methods for processing author data when there exists no author table
7. added code for retry failed processing paper
8. more detailed error messages storage
2023-08-10 01:15:17 +08:00
XCX
a9c753567c Add code for saving data 2023-08-09 12:22:42 +08:00
XCX
9ee9bc4462 Replace the code for merging data 2023-08-08 22:57:29 +08:00
XCX
73cf15980f Fix the bugs 2023-08-08 22:48:55 +08:00
ldy
49746b779b handled 2 typos in month while formatting date 2023-08-08 13:24:51 +08:00