ldy
61ef0081d8
Pushed new "ejde_buffer_transformation.zip"
2023-11-02 04:33:02 +08:00
ldy
ad63bcf6c4
Updated ejde parser format
...
Fixed duplicate data dumping problem
Pushed new "ejde_buffer.zip"
2023-11-02 04:28:06 +08:00
Chenxiao Xia
50e30e105b
New code for transformation. Transform the structure of SprigerOpen data
2023-11-01 20:11:20 +08:00
Chenxiao Xia
7f9ab94adc
New code for transformation
2023-11-01 13:13:28 +08:00
Chenxiao Xia
35ea1dd424
Remove error code
2023-11-01 13:12:38 +08:00
Chenxiao Xia
ad6ba8832a
Merge branch 'main' of https://code.uic.edu.cn/scholar-data-mining/cst-project
2023-11-01 13:08:28 +08:00
ldy
dd0c4379da
Push: ejde transformed data
2023-11-01 13:07:24 +08:00
Chenxiao Xia
c3c460a4dc
Fix bugs, new code for transform ejde data structure
2023-11-01 12:57:09 +08:00
Chenxiao Xia
4c2c68feca
the data of ejde.com has been transform
2023-11-01 12:37:08 +08:00
Chenxiao Xia
24aa62c8db
Fix the error of output, a new copies of data has been upload
2023-11-01 10:28:08 +08:00
Chenxiao Xia
e07617bebc
A new code for transforming data structure
2023-10-29 15:21:01 +08:00
Chenxiao Xia
11e326ea76
Fix some bugs
2023-10-29 15:19:40 +08:00
Chenxiao Xia
e37374d1e2
Add a new file include transform data
2023-10-29 15:17:53 +08:00
ldy
8751b55617
updated Aminer security key
2023-10-10 10:25:04 +08:00
Chenxiao Xia
193581cd6a
Update code for merging author data by API
2023-10-09 23:24:58 +08:00
Chenxiao Xia
9b33cbabe7
full data of calling aminer API
2023-10-06 13:55:50 +08:00
ldy
123da7e7e8
Update 6 files
...
- /.idea/inspectionProfiles/Project_Default.xml
- /.idea/inspectionProfiles/profiles_settings.xml
- /.idea/.gitignore
- /.idea/cst-project.iml
- /.idea/modules.xml
- /.idea/vcs.xml
2023-09-26 15:40:17 +00:00
ldy
ef9ab9abb1
Optimized file allocation
2023-09-26 23:37:43 +08:00
ldy
f45c63fa8c
New feature:
...
1. unidecode for EJDE parser
2. get paper citation by aminer api
Bug contains:
1. function "scholarly_get_citation" cannot work properly
2023-09-23 10:43:46 +08:00
Chenxiao Xia
2f6f86a48e
Fix bugs and add new code to search author data without email information
2023-09-20 23:29:42 +08:00
Chenxiao Xia
2a1fcfc4cd
New function to divide author data according to their last name's first letter
2023-09-16 18:48:06 +08:00
Chenxiao Xia
34fb579f7c
Fix bugs
2023-09-16 18:46:52 +08:00
ldy
a2284b7b45
git push for renew personal info
2023-09-12 13:23:41 +08:00
Chenxiao Xia
d374a3766b
Update new save code
2023-09-12 09:05:55 +08:00
XCX
c75d2545d7
Change into zip files
2023-08-21 12:19:35 +08:00
XCX
2a249cb6d6
Origin data and merged data
2023-08-21 12:16:08 +08:00
ldy
b5ce290ea5
Optimization:
...
1. add conference special issue papers
2. optimized counting process
3. enhanced saving robustness
2023-08-20 16:48:43 +08:00
XCX
e217342ce2
Full same web merge code
2023-08-20 00:07:47 +08:00
ldy
ba3671b5fd
Optimization:
...
1. add special issue papers
2. optimized doi regular expression
Bug fix:
1. multiple email problem
Reminder:
1. author's email data structure changed due to multiple email problem
2023-08-19 21:15:12 +08:00
XCX
88bcbf5b8f
Changed the data structure
2023-08-18 19:43:22 +08:00
XCX
1d79556c42
Change the data structure
2023-08-13 21:38:48 +08:00
XCX
1602d03e9d
Change the data structure
2023-08-13 21:36:22 +08:00
XCX
27707a058c
Change the data structure
2023-08-13 21:29:10 +08:00
ldy
083e6c87eb
Optimization:
...
strip "\newline" in author name
2023-08-11 20:45:04 +08:00
ldy
ed469ee362
Bug Fix:
...
1. reformat regular expressions for keyword matching
2023-08-11 19:52:40 +08:00
ldy
b1eba69085
Bug Fix:
...
1. hr_count soup should be article_soup
2023-08-11 19:16:03 +08:00
ldy
68a755a633
Bug Fix:
...
1. added split author data when hits "\n"
2. added split name by "."
3. added method extracting author info when have 3 hr tag
2023-08-11 19:13:33 +08:00
ldy
f97195c94d
Bug Fix:
...
handled exception when the volume website has no title
2023-08-11 18:05:15 +08:00
ldy
3e78e9f48e
Optimization:
...
1. added new regular expression format for volume
2. added new strip method for msc
3. deleted blank-space author
4. optimized middle name strip method
5. added new matching pattern for no table author list
6. added exception storing for AUTHOR SEARCHING ERROR
Bug fix:
1. error record saving
2023-08-11 14:26:59 +08:00
ldy
69b10a9f72
Merge remote-tracking branch 'origin/main'
2023-08-11 12:44:53 +08:00
XCX
e504e73409
Changed the file path of saving data
2023-08-11 12:22:48 +08:00
XCX
8ea31d08f4
Fix the bug of adding duplicate data
2023-08-11 12:19:55 +08:00
ldy
35f5f2ac5e
Optimization:
...
clustered error files into a folder
2023-08-11 11:42:02 +08:00
ldy
7726650eaa
Bug fixed:
...
ignored blank-space elements in the middle name list
2023-08-10 13:40:26 +08:00
ldy
71e613d994
Optimization:
...
less memory usage
data collection for volume HTML format error
added time elapse monitor
2023-08-10 12:57:28 +08:00
ldy
2c25682f81
Bug Fix:
...
1. unworkable retrying function back online baby
New Function:
1. reformatted datetime_transform funtion to handle more month typos
2. reformatted process_article function into 3 functions to use multi-threads better running time
3. renewed article url search technique to handle different volume websites
4. more exception handling
5. bettered keywords and affiliation strip method
6. added methods for processing author data when there exists no author table
7. added code for retry failed processing paper
8. more detailed error messages storage
2023-08-10 01:15:17 +08:00
XCX
a9c753567c
Add code for saving data
2023-08-09 12:22:42 +08:00
XCX
9ee9bc4462
Replace the code for merging data
2023-08-08 22:57:29 +08:00
XCX
73cf15980f
Fix the bugs
2023-08-08 22:48:55 +08:00
ldy
49746b779b
handled 2 typos in month while formatting date
2023-08-08 13:24:51 +08:00