Commit Graph

  • ad427c24dc Fixed: miss append data while multithreading Pushed: new parsed zip main ldy 2023-11-02 11:19:59 +08:00
  • 61ef0081d8 Pushed new "ejde_buffer_transformation.zip" ldy 2023-11-02 04:33:02 +08:00
  • ad63bcf6c4 Updated ejde parser format Fixed duplicate data dumping problem Pushed new "ejde_buffer.zip" ldy 2023-11-02 04:28:06 +08:00
  • 50e30e105b New code for transformation. Transform the structure of SprigerOpen data Chenxiao Xia 2023-11-01 20:11:20 +08:00
  • 7f9ab94adc New code for transformation Chenxiao Xia 2023-11-01 13:13:28 +08:00
  • 35ea1dd424 Remove error code Chenxiao Xia 2023-11-01 13:12:38 +08:00
  • ad6ba8832a Merge branch 'main' of https://code.uic.edu.cn/scholar-data-mining/cst-project Chenxiao Xia 2023-11-01 13:08:28 +08:00
  • dd0c4379da Push: ejde transformed data ldy 2023-11-01 13:07:24 +08:00
  • c3c460a4dc Fix bugs, new code for transform ejde data structure Chenxiao Xia 2023-11-01 12:57:09 +08:00
  • 4c2c68feca the data of ejde.com has been transform Chenxiao Xia 2023-11-01 12:37:08 +08:00
  • 24aa62c8db Fix the error of output, a new copies of data has been upload Chenxiao Xia 2023-11-01 10:28:08 +08:00
  • e07617bebc A new code for transforming data structure Chenxiao Xia 2023-10-29 15:21:01 +08:00
  • 11e326ea76 Fix some bugs Chenxiao Xia 2023-10-29 15:19:40 +08:00
  • e37374d1e2 Add a new file include transform data Chenxiao Xia 2023-10-29 15:17:53 +08:00
  • 8751b55617 updated Aminer security key ldy 2023-10-10 10:25:04 +08:00
  • 193581cd6a Update code for merging author data by API Chenxiao Xia 2023-10-09 23:24:58 +08:00
  • 9b33cbabe7 full data of calling aminer API Chenxiao Xia 2023-10-06 13:55:50 +08:00
  • 123da7e7e8 Update 6 files ldy 2023-09-26 15:40:17 +00:00
  • ef9ab9abb1 Optimized file allocation ldy 2023-09-26 23:37:43 +08:00
  • f45c63fa8c New feature: 1. unidecode for EJDE parser 2. get paper citation by aminer api Bug contains: 1. function "scholarly_get_citation" cannot work properly ldy 2023-09-23 10:43:46 +08:00
  • 2f6f86a48e Fix bugs and add new code to search author data without email information Chenxiao Xia 2023-09-20 23:29:42 +08:00
  • 2a1fcfc4cd New function to divide author data according to their last name's first letter Chenxiao Xia 2023-09-16 18:48:06 +08:00
  • 34fb579f7c Fix bugs Chenxiao Xia 2023-09-16 18:46:52 +08:00
  • a2284b7b45 git push for renew personal info ldy 2023-09-12 13:23:41 +08:00
  • d374a3766b Update new save code Chenxiao Xia 2023-09-12 09:05:55 +08:00
  • c75d2545d7 Change into zip files XCX 2023-08-21 12:19:35 +08:00
  • 2a249cb6d6 Origin data and merged data XCX 2023-08-21 12:16:08 +08:00
  • b5ce290ea5 Optimization: 1. add conference special issue papers 2. optimized counting process 3. enhanced saving robustness ldy 2023-08-20 16:48:43 +08:00
  • e217342ce2 Full same web merge code XCX 2023-08-20 00:07:47 +08:00
  • ba3671b5fd Optimization: 1. add special issue papers 2. optimized doi regular expression Bug fix: 1. multiple email problem Reminder: 1. author's email data structure changed due to multiple email problem ldy 2023-08-19 21:15:12 +08:00
  • 88bcbf5b8f Changed the data structure XCX 2023-08-18 19:43:22 +08:00
  • 1d79556c42 Change the data structure XCX 2023-08-13 21:38:48 +08:00
  • 1602d03e9d Change the data structure XCX 2023-08-13 21:36:22 +08:00
  • 27707a058c Change the data structure XCX 2023-08-13 21:29:10 +08:00
  • 083e6c87eb Optimization: strip "\newline" in author name ldy 2023-08-11 20:45:04 +08:00
  • ed469ee362 Bug Fix: 1. reformat regular expressions for keyword matching ldy 2023-08-11 19:52:40 +08:00
  • b1eba69085 Bug Fix: 1. hr_count soup should be article_soup ldy 2023-08-11 19:16:03 +08:00
  • 68a755a633 Bug Fix: 1. added split author data when hits "\n" 2. added split name by "." 3. added method extracting author info when have 3 hr tag ldy 2023-08-11 19:13:33 +08:00
  • f97195c94d Bug Fix: handled exception when the volume website has no title ldy 2023-08-11 18:05:15 +08:00
  • 3e78e9f48e Optimization: 1. added new regular expression format for volume 2. added new strip method for msc 3. deleted blank-space author 4. optimized middle name strip method 5. added new matching pattern for no table author list 6. added exception storing for AUTHOR SEARCHING ERROR Bug fix: 1. error record saving ldy 2023-08-11 14:26:59 +08:00
  • 69b10a9f72 Merge remote-tracking branch 'origin/main' ldy 2023-08-11 12:44:53 +08:00
  • e504e73409 Changed the file path of saving data XCX 2023-08-11 12:22:48 +08:00
  • 8ea31d08f4 Fix the bug of adding duplicate data XCX 2023-08-11 12:19:55 +08:00
  • 35f5f2ac5e Optimization: clustered error files into a folder ldy 2023-08-11 11:42:02 +08:00
  • 7726650eaa Bug fixed: ignored blank-space elements in the middle name list ldy 2023-08-10 13:40:26 +08:00
  • 71e613d994 Optimization: less memory usage data collection for volume HTML format error added time elapse monitor ldy 2023-08-10 12:57:28 +08:00
  • 2c25682f81 Bug Fix: 1. unworkable retrying function back online baby New Function: 1. reformatted datetime_transform funtion to handle more month typos 2. reformatted process_article function into 3 functions to use multi-threads better running time 3. renewed article url search technique to handle different volume websites 4. more exception handling 5. bettered keywords and affiliation strip method 6. added methods for processing author data when there exists no author table 7. added code for retry failed processing paper 8. more detailed error messages storage ldy 2023-08-10 01:15:17 +08:00
  • a9c753567c Add code for saving data XCX 2023-08-09 12:22:42 +08:00
  • 9ee9bc4462 Replace the code for merging data XCX 2023-08-08 22:57:29 +08:00
  • 73cf15980f Fix the bugs XCX 2023-08-08 22:48:55 +08:00
  • 49746b779b handled 2 typos in month while formatting date ldy 2023-08-08 13:24:51 +08:00
  • 1e98615778 A new code for same web data merge00_File_merge XCX 2023-08-06 19:42:43 +08:00
  • e9bdb9cdff deleted unnecessary retrying commands ldy 2023-08-03 12:06:22 +08:00
  • e49e829682 Fix the saving problem XCX 2023-08-03 12:01:51 +08:00
  • 2d1f2c504d 更新 EJDE_spider/ejde_main.py ldy 2023-08-02 11:21:37 +08:00
  • 2fc3b85bab Corrected the loops, the program will now not add the same data repeatedly XCX 2023-08-01 19:11:24 +08:00
  • 01c1a7d978 Changed the code to unify the time format XCX 2023-07-31 18:19:11 +08:00
  • ee0f956645 Merge branch 'main' of https://git.ecwuuuuu.com/datamining/CST_scrawlCode SHL 2023-07-27 12:05:50 +08:00
  • 20cf71530a 10 years articles SHL 2023-07-27 12:04:37 +08:00
  • c1e1e59e05 更新 EJQTDE_spider/ejqtde_main.py XCX 2023-07-27 10:30:26 +08:00
  • 07c334a903 删除文件 XCX 2023-07-27 10:28:51 +08:00
  • 26fed37e17 Modified old code XCX 2023-07-27 10:26:02 +08:00
  • cfa9345a79 Update a new spider code for math.u-szeged.hu/ejqtde. Modified the code of SpringerOpen_spider XCX 2023-07-26 23:25:30 +08:00
  • b2c845dc6e the code of projecteuclid_spider SHL 2023-07-14 20:47:47 +08:00
  • d8addf5204 Update the code these weeks XCX 2023-07-14 18:50:36 +08:00
  • 04806fa367 Initial commit XCX 2023-07-14 18:29:27 +08:00