ldy
2c25682f81
Bug Fix:
...
1. unworkable retrying function back online baby
New Function:
1. reformatted datetime_transform funtion to handle more month typos
2. reformatted process_article function into 3 functions to use multi-threads better running time
3. renewed article url search technique to handle different volume websites
4. more exception handling
5. bettered keywords and affiliation strip method
6. added methods for processing author data when there exists no author table
7. added code for retry failed processing paper
8. more detailed error messages storage
2023-08-10 01:15:17 +08:00
XCX
a9c753567c
Add code for saving data
2023-08-09 12:22:42 +08:00
XCX
9ee9bc4462
Replace the code for merging data
2023-08-08 22:57:29 +08:00
XCX
73cf15980f
Fix the bugs
2023-08-08 22:48:55 +08:00
ldy
49746b779b
handled 2 typos in month while formatting date
2023-08-08 13:24:51 +08:00
XCX
1e98615778
A new code for same web data merge00_File_merge
2023-08-06 19:42:43 +08:00
ldy
e9bdb9cdff
deleted unnecessary retrying commands
2023-08-03 12:06:22 +08:00
XCX
e49e829682
Fix the saving problem
2023-08-03 12:01:51 +08:00
ldy
2d1f2c504d
更新 EJDE_spider/ejde_main.py
...
adjust output datetime format
2023-08-02 11:21:37 +08:00
XCX
2fc3b85bab
Corrected the loops, the program will now not add the same data repeatedly
2023-08-01 19:11:24 +08:00
XCX
01c1a7d978
Changed the code to unify the time format
2023-07-31 18:19:11 +08:00
SHL
ee0f956645
Merge branch 'main' of https://git.ecwuuuuu.com/datamining/CST_scrawlCode
2023-07-27 12:05:50 +08:00
SHL
20cf71530a
10 years articles
2023-07-27 12:04:37 +08:00
XCX
c1e1e59e05
更新 EJQTDE_spider/ejqtde_main.py
2023-07-27 10:30:26 +08:00
XCX
07c334a903
删除文件
...
该文件已经移动至其他文件夹ProjectEuclid_spider,并且本地已经备份原文件
Signed-off-by: XCX <xcx@jack@ecwuuuuu.com>
2023-07-27 10:28:51 +08:00
XCX
26fed37e17
Modified old code
2023-07-27 10:26:02 +08:00
XCX
cfa9345a79
Update a new spider code for math.u-szeged.hu/ejqtde. Modified the code of SpringerOpen_spider
2023-07-26 23:25:30 +08:00
SHL
b2c845dc6e
the code of projecteuclid_spider
2023-07-14 20:47:47 +08:00
XCX
d8addf5204
Update the code these weeks
2023-07-14 18:50:36 +08:00
XCX
04806fa367
Initial commit
2023-07-14 18:29:27 +08:00