Three Ways for C to Read Excel and Comparative Analysis

2021-08-17 00:44:06
OfStack

(1) OleDB mode

Advantages: Excel is directly treated as a data source, and the content is directly read through SQL, so the reading speed is faster.

Disadvantages: The way of reading data is not flexible enough, and it is impossible to read a certain cell directly. Only after reading the whole Sheet page (the result is Datatable), the specified value can be obtained in Datatable according to the number of columns and columns.

When the amount of Excel data is large. It will take up a lot of memory, and when there is not enough memory, it will throw an exception of memory overflow.

The read code is as follows:


 public DataTable GetExcelTableByOleDB(string strExcelPath, string tableName)
 {
   try
   {
     DataTable dtExcel = new DataTable();
     // Data table 
     DataSet ds = new DataSet();
     // Get the file extension 
     string strExtension = System.IO.Path.GetExtension(strExcelPath);
     string strFileName = System.IO.Path.GetFileName(strExcelPath);
     //Excel Connection of 
     OleDbConnection objConn = null;
     switch (strExtension)
     {
       case ".xls":
         objConn = new OleDbConnection("Provider=Microsoft.Jet.OLEDB.4.0;Data Source=" + strExcelPath + ";" + "Extended Properties=\"Excel 8.0;HDR=NO;IMEX=1;\"");
         break;
       case ".xlsx":
         objConn = new OleDbConnection("Provider=Microsoft.ACE.OLEDB.12.0;Data Source=" + strExcelPath + ";" + "Extended Properties=\"Excel 12.0;HDR=NO;IMEX=1;\"");
         break;
       default:
         objConn = null;
         break;
     }
     if (objConn == null)
     {
       return null;
     }
     objConn.Open();
     // Get Excel All in Sheet Table information 
     //System.Data.DataTable schemaTable = objConn.GetOleDbSchemaTable(System.Data.OleDb.OleDbSchemaGuid.Tables, null);
     // Get Excel The first part of 1 A Sheet Table name 
     //string tableName = schemaTable.Rows[0][2].ToString().Trim();
     string strSql = "select * from [" + tableName + "]";
     // Get Excel Specify Sheet Information in the table 
     OleDbCommand objCmd = new OleDbCommand(strSql, objConn);
     OleDbDataAdapter myData = new OleDbDataAdapter(strSql, objConn);
     myData.Fill(ds, tableName);// Fill in data 
     objConn.Close();
     //dtExcel That is excel Information stored in the specified table in the file 
     dtExcel = ds.Tables[tableName];
     return dtExcel;
   }
   catch
   {
     return null;
   }
 }

The following description 1 below the connection string

HDR=Yes, which means that the first row is the title and is not used as data (but in actual use, if there is a complex value in the first row, the read Datatable column title will be automatically set to F1, F2, etc., which is inconsistent with the actual application, so at that time, all the contents were read into Datatable by HDR=No, and then the first row was manually set as the title); IMEX (IMport EXport mode) Settings
IMEX has three modes:
0 is Export mode
1 is Import mode
2 is Linked mode (full update capabilities)
What I want to explain here is the IMEX parameter, because different modes represent different read and write behaviors:
When IMEX = 0, it is "export mode", and Excel files opened in this mode can only be used for "write" purposes.
When IMEX = 1, it is "import mode", and Excel files opened in this mode can only be used for "read" purposes.
When IMEX = 2, it is "link mode", and the Excel files opened in this mode can support both "read" and "write" purposes.

---------------------------------

In addition, when reading files for the Excel 2007 version, the version should be changed from 8.0 to 12.0, and the driver can no longer use Jet, but should use ACE. Responsible will cause the error "No installable ISAM was found".

---------------------------------

It is also found on the Internet that the number of Sheet tables taken out in this way is more than the number of Sheet tables in the actual Excel tables, for two reasons:

1. The name taken out includes the name in XL naming manager (see the formula of XL2007-naming manager, shortcut key Crtl+F3);

2. The name taken out includes the suffix FilterDatabase, which is used by XL to record the range of Filter.

For the first point, it is relatively simple to delete the contents in the existing naming manager; The second point is troublesome to deal with. After Filter is deleted, these names are still retained. The simple way is to add Sheet and then add the original Sheet Copy. However, the actual situation cannot do the above check for each Excel. The filtering scheme is given below. (We have verified this problem, please verify it yourself)


 //objConn For reading Excel The link of, the following is filtered to get a valid Sheet Page name collection 
  System.Data.DataTable schemaTable = objConn.GetOleDbSchemaTable(System.Data.OleDb.OleDbSchemaGuid.Tables, null);
  List<string> lstSheetNames = new List<string>();
  for (int i = 0; i < schemaTable.Rows.Count; i++)
  {
    string strSheetName = (string)dtSheetName.Rows[i]["TABLE_NAME"];
    if (strSheetName.Contains("$") && !strSheetName.Replace("'", "").EndsWith("$"))
    {
      // Invalid filter SheetName Over ....
      continue;
    }
    if (lstSheetNames != null && !lstSheetNames.Contains(strSheetName))
      lstSheetNames.Add(strSheetName);
  }

Because it is invalid to read SheetName1, the last character will not be $. If the SheetName has 1 special symbol, the read SheetName will be automatically enclosed in single quotation marks. For example, when SheetName is edited into MySheet (1) in Excel, the read SheetName is' MySheet (1) $', so it is best to filter single quotation marks under 1 before judging whether the last character is $.

---------------------------------

(2) The way Com components are implemented by adding Microsoft. Office. Interop. Excel references

Advantages: It can read the data in Excel very flexibly, and users can call various functions flexibly to deal with it.

Disadvantages: Based on cell processing, the reading speed is slow, and it is best not to use this method to read files with large amount of data.

You need to add a corresponding DLL reference, which must exist before it can be used. If the Web site is deployed on IIS, you also need to have Excel installed on the server machine, and sometimes you need to configure IIS permissions.

The read code is as follows:


 private Stopwatch wath = new Stopwatch();
 /// <summary>
 ///  Use COM Read Excel
 /// </summary>
 /// <param name="excelFilePath"> Path </param>
 /// <returns>DataTabel</returns>
 public System.Data.DataTable GetExcelData(string excelFilePath)
 {
   Excel.Application app = new Excel.Application();
   Excel.Sheets sheets;
   Excel.Workbook workbook = null;
   object oMissiong = System.Reflection.Missing.Value;
   System.Data.DataTable dt = new System.Data.DataTable();
   wath.Start();
   try
   {
     if (app == null)
     {
       return null;
     }
     workbook = app.Workbooks.Open(excelFilePath, oMissiong, oMissiong, oMissiong, oMissiong, oMissiong, 
       oMissiong, oMissiong, oMissiong, oMissiong, oMissiong, oMissiong, oMissiong, oMissiong, oMissiong);
     // Read data into the DataTable Middle- Start  
     sheets = workbook.Worksheets;
     Excel.Worksheet worksheet = (Excel.Worksheet)sheets.get_Item(1);// Read the 1 Table 
     if (worksheet == null)
       return null;
     string cellContent;
     int iRowCount = worksheet.UsedRange.Rows.Count;
     int iColCount = worksheet.UsedRange.Columns.Count;
     Excel.Range range;
     // Responsible for column header Start
     DataColumn dc;
     int ColumnID = 1;
     range = (Excel.Range)worksheet.Cells[1, 1];
     while (range.Text.ToString().Trim() != "")
     {
       dc = new DataColumn();
       dc.DataType = System.Type.GetType("System.String");
       dc.ColumnName = range.Text.ToString().Trim();
       dt.Columns.Add(dc);
 
       range = (Excel.Range)worksheet.Cells[1, ++ColumnID];
     }
     //End
     for (int iRow = 2; iRow <= iRowCount; iRow++)
     {
       DataRow dr = dt.NewRow();
       for (int iCol = 1; iCol <= iColCount; iCol++)
       {
         range = (Excel.Range)worksheet.Cells[iRow, iCol];
         cellContent = (range.Value2 == null) ? "" : range.Text.ToString();
           dr[iCol - 1] = cellContent;
       }
       dt.Rows.Add(dr);
     }
     wath.Stop();
     TimeSpan ts = wath.Elapsed;
     // Read data into the DataTable Middle- End
     return dt;
   }
   catch
   {
     return null;
   }
   finally
   {
     workbook.Close(false, oMissiong, oMissiong);
     System.Runtime.InteropServices.Marshal.ReleaseComObject(workbook);
     workbook = null;
     app.Workbooks.Close();
     app.Quit();
     System.Runtime.InteropServices.Marshal.ReleaseComObject(app);
     app = null;
     GC.Collect();
     GC.WaitForPendingFinalizers();
   }
 }
 /// <summary>
 ///  Use COM Multithreaded reading Excel ( 1  The main thread, 4  Secondary thread) 
 /// </summary>
 /// <param name="excelFilePath"> Path </param>
 /// <returns>DataTabel</returns>
 public System.Data.DataTable ThreadReadExcel(string excelFilePath)
 {
   Excel.Application app = new Excel.Application();
   Excel.Sheets sheets = null;
   Excel.Workbook workbook = null;
   object oMissiong = System.Reflection.Missing.Value;
   System.Data.DataTable dt = new System.Data.DataTable();
   wath.Start();
   try
   {
     if (app == null)
     {
       return null;
     }
     workbook = app.Workbooks.Open(excelFilePath, oMissiong, oMissiong, oMissiong, oMissiong, oMissiong, oMissiong, 
       oMissiong, oMissiong, oMissiong, oMissiong, oMissiong, oMissiong, oMissiong, oMissiong);
     // Read data into the DataTable Middle- Start  
     sheets = workbook.Worksheets;
     Excel.Worksheet worksheet = (Excel.Worksheet)sheets.get_Item(1);// Read the 1 Table 
     if (worksheet == null)
       return null;
     string cellContent;
     int iRowCount = worksheet.UsedRange.Rows.Count;
     int iColCount = worksheet.UsedRange.Columns.Count;
     Excel.Range range;
     // Responsible for column header Start
     DataColumn dc;
     int ColumnID = 1;
     range = (Excel.Range)worksheet.Cells[1, 1];
     while (iColCount >= ColumnID)
     {
       dc = new DataColumn();
       dc.DataType = System.Type.GetType("System.String");
       string strNewColumnName = range.Text.ToString().Trim();
       if (strNewColumnName.Length == 0) strNewColumnName = "_1";
       // Determine whether column names are duplicate 
       for (int i = 1; i < ColumnID; i++)
       {
         if (dt.Columns[i - 1].ColumnName == strNewColumnName)
           strNewColumnName = strNewColumnName + "_1";
       }
       dc.ColumnName = strNewColumnName;
       dt.Columns.Add(dc);
       range = (Excel.Range)worksheet.Cells[1, ++ColumnID];
     }
     //End
     // Data greater than 500 Bar, using multiple processes to read data 
     if (iRowCount - 1 > 500)
     {
       // Start multithreaded data reading 
       // New Thread 
       int b2 = (iRowCount - 1) / 10;
       DataTable dt1 = new DataTable("dt1");
       dt1 = dt.Clone();
       SheetOptions sheet1thread = new SheetOptions(worksheet, iColCount, 2, b2 + 1, dt1);
       Thread othread1 = new Thread(new ThreadStart(sheet1thread.SheetToDataTable));
       othread1.Start();
       // Obstruction  1  Milliseconds, guarantee the first 1 Read  dt1
       Thread.Sleep(1);
       DataTable dt2 = new DataTable("dt2");
       dt2 = dt.Clone();
       SheetOptions sheet2thread = new SheetOptions(worksheet, iColCount, b2 + 2, b2 * 2 + 1, dt2);
       Thread othread2 = new Thread(new ThreadStart(sheet2thread.SheetToDataTable));
       othread2.Start();
       DataTable dt3 = new DataTable("dt3");
       dt3 = dt.Clone();
       SheetOptions sheet3thread = new SheetOptions(worksheet, iColCount, b2 * 2 + 2, b2 * 3 + 1, dt3);
       Thread othread3 = new Thread(new ThreadStart(sheet3thread.SheetToDataTable));
       othread3.Start();
       DataTable dt4 = new DataTable("dt4");
       dt4 = dt.Clone();
       SheetOptions sheet4thread = new SheetOptions(worksheet, iColCount, b2 * 3 + 2, b2 * 4 + 1, dt4);
       Thread othread4 = new Thread(new ThreadStart(sheet4thread.SheetToDataTable));
       othread4.Start();
       // The main thread reads the remaining data 
       for (int iRow = b2 * 4 + 2; iRow <= iRowCount; iRow++)
       {
         DataRow dr = dt.NewRow();
         for (int iCol = 1; iCol <= iColCount; iCol++)
         {
           range = (Excel.Range)worksheet.Cells[iRow, iCol];
           cellContent = (range.Value2 == null) ? "" : range.Text.ToString();
           dr[iCol - 1] = cellContent;
         }
         dt.Rows.Add(dr);
       }
       othread1.Join();
       othread2.Join();
       othread3.Join();
       othread4.Join();
       // Appends data read by multiple threads to the  dt1  Back 
       foreach (DataRow dr in dt.Rows)
         dt1.Rows.Add(dr.ItemArray);
       dt.Clear();
       dt.Dispose();
       foreach (DataRow dr in dt2.Rows)
         dt1.Rows.Add(dr.ItemArray);
       dt2.Clear();
       dt2.Dispose();
       foreach (DataRow dr in dt3.Rows)
         dt1.Rows.Add(dr.ItemArray);
       dt3.Clear();
       dt3.Dispose();
       foreach (DataRow dr in dt4.Rows)
         dt1.Rows.Add(dr.ItemArray);
       dt4.Clear();
       dt4.Dispose();
       return dt1;
     }
     else
     {
       for (int iRow = 2; iRow <= iRowCount; iRow++)
       {
         DataRow dr = dt.NewRow();
         for (int iCol = 1; iCol <= iColCount; iCol++)
         {
           range = (Excel.Range)worksheet.Cells[iRow, iCol];
           cellContent = (range.Value2 == null) ? "" : range.Text.ToString();
           dr[iCol - 1] = cellContent;
         }
         dt.Rows.Add(dr);
       }
     }
     wath.Stop();
     TimeSpan ts = wath.Elapsed;
     // Read data into the DataTable Middle- End
     return dt;
   }
   catch
   {
     return null;
   }
   finally
   {
     workbook.Close(false, oMissiong, oMissiong);
     System.Runtime.InteropServices.Marshal.ReleaseComObject(workbook);
     System.Runtime.InteropServices.Marshal.ReleaseComObject(sheets);
     workbook = null;
     app.Workbooks.Close();
     app.Quit();
     System.Runtime.InteropServices.Marshal.ReleaseComObject(app);
     app = null;
     GC.Collect();
     GC.WaitForPendingFinalizers();
   }
 }

(3) Read Excel in NPOI mode (this method has not been tested)

NPOI is the. NET version of the POI project. POI is an open source Java reading and writing Excel, WORD and other Microsoft OLE2 component documentation project. With NPOI you can read and write WORD/EXCEL documents on machines that do not have Office installed or in the corresponding environment.

Advantages: The speed of reading Excel is fast, and the operation of reading mode is flexible

Disadvantages: You need to download the corresponding plug-in and add it to the system reference.


 /// <summary>
 ///  Will excel Import the data in the DataTable Medium 
 /// </summary>
 /// <param name="sheetName">excel Workbook sheet Name of </param>
 /// <param name="isFirstRowColumn"> No. 1 1 Whether the row is DataTable Column name of </param>
 /// <returns> Returned DataTable</returns>
 public DataTable ExcelToDataTable(string sheetName, bool isFirstRowColumn)
 {
   ISheet sheet = null;
   DataTable data = new DataTable();
   int startRow = 0;
   try
   {
     fs = new FileStream(fileName, FileMode.Open, FileAccess.Read);
     if (fileName.IndexOf(".xlsx") > 0) // 2007 Version 
       workbook = new XSSFWorkbook(fs);
     else if (fileName.IndexOf(".xls") > 0) // 2003 Version 
       workbook = new HSSFWorkbook(fs);
     if (sheetName != null)
     {
       sheet = workbook.GetSheet(sheetName);
     }
     else
     {
       sheet = workbook.GetSheetAt(0);
     }
     if (sheet != null)
     {
       IRow firstRow = sheet.GetRow(0);
       int cellCount = firstRow.LastCellNum; //1 At the end of the line 1 A cell Number of   That is, the total number of columns 
       if (isFirstRowColumn)
       {
         for (int i = firstRow.FirstCellNum; i < cellCount; ++i)
         {
           DataColumn column = new DataColumn(firstRow.GetCell(i).StringCellValue);
           data.Columns.Add(column);
         }
         startRow = sheet.FirstRowNum + 1;
       }
       else
       {
         startRow = sheet.FirstRowNum;
       }
       // Finally 1 Label of column 
       int rowCount = sheet.LastRowNum;
       for (int i = startRow; i <= rowCount; ++i)
       {
         IRow row = sheet.GetRow(i);
         if (row == null) continue; // Rows without data default to null 　　　　　　　 
         
         DataRow dataRow = data.NewRow();
         for (int j = row.FirstCellNum; j < cellCount; ++j)
         {
           if (row.GetCell(j) != null) // Similarly, cells without data default to null
             dataRow[j] = row.GetCell(j).ToString();
         }
         data.Rows.Add(dataRow);
       }
     }
     return data;
   }
   catch (Exception ex)
   {
     Console.WriteLine("Exception: " + ex.Message);
     return null;
   }
 }

The following are some related articles, you can refer to the following