If you’re regularly using large files and obscure file formats, you know that general AI platforms like Claude and ChatGPT often run out of tokens fast when you’re trying to upload a big file.
What File Formats & Sizes Can Top AI Platforms Accept?
At a glance, this section showcases ChatGPT, Gemini, & Claude’s accepted file formats and sizes in comparison with Lium’s AI platform built for big, complex data:
ChatGPT
What File Formats Does ChatGPT Always Accept?
ChatGPT’s max file size that it accepts is 512 MB. That doesn’t work for people in industries that require dense, complex data for informed decision-making. To make matters worse, ChatGPT and the like also can’t accept file formats like NetCDF and SEG-Y that are commonly used in advanced industries.
So what file sizes and formats can Claude, Gemini, and ChatGPT accept? And is there an alternative that can process larger, less common file formats for the industries that need it? Here’s what to know about current general AI’s file processing capabilities, and the solutions that Lium offers for larger datasets (for free!)
Start for Free
See the platform with your data type
Let’s breakdown each platform’s file processing capabilities:
ChatGPT
What File Formats Does ChatGPT Always Accept?
You can upload documents, spreadsheets, and images directly to ChatGPT, including the following file formats:
- Documents: DOCX, PDF, TXT, RTF
- Spreadsheets: XLSX, XLS, CSV
- Presentations: PPTX
- Images: JPG, WEBP, PNG
- Code Files: PY, JS, HTML, TSV, SQL, etc.
- Audio Files: MP3
In some instances, but not always, other file types can be interpreted as well, but not with full reliability.
When ChatGPT was asked directly about formats that aren’t fully supported, it responded that there are other additional file types it may be able to successfully crawl, “but parsing/analysis reliability varies based on codec, encoding, container structure, corruption, size or platform implementation,” meaning it may not be able to provide answers to your questions based on the provided data.
What File Formats Does ChatGPT Not Accept?
For advanced sectors using obscure file formats, ChatGPT cannot often crawl and/or interpret their data. When prompted itself about file formats it cannot process, OpenAI’s LLM called out many files used commonly in these advanced industries:
- Geospatial / GIS: .gdb, .mdb, .sid, .ecw, .e00, .adf, .3mx, .slpk, large .las/.laz point clouds
- Satellite / Remote Sensing: .nitf/.nsif, .SAFE, .he5, .img (ERDAS), .pix, proprietary SAR collections, hyperspectral cubes
- Energy / Utilities: .sav, .dyr, .aux, Petrel project files, CMG reservoir simulation files, ECLIPSE reservoir models (.DATA, .GRID, .UNRST), SEG-D seismic files
- Engineering / CAD: .rvt, .max, .blend, .prt, .par, .asm, .jt, .cgr, complex BIM coordination models
- Scientific Research: .root, .sav (SPSS), .zsav, .nii/.nii.gz, .czi, .lif, .nd2, large microscopy stacks
- Finance / Quantitative: Bloomberg .bbg, Reuters proprietary feeds, kdb+/q databases, proprietary tick databases, market replay archives
- Manufacturing / Industrial: .acd, .zap, Siemens TIA Portal project files, Fanuc robot programs, industrial historian archives, PLC firmware images
- Life Sciences / Biotech: .cram, .gff, .gtf, .loom, .h5ad, .cif, molecular dynamics trajectory files (.dcd, .xtc)
Areas that the platform commonly struggles include proprietary binaries, compiled artifacts, and hardware-dependent formats.
Even in some instances that these formats are crawled successfully, ChatGPT often can’t semantically understand the content, meaning it can’t provide the fully-accurate answers your complex questions demand.
What Maximum File Sizes Can ChatGPT Accept?
- General file size: For any file, including documents, spreadsheets, code, and proprietary files, the maximum file size is 512 megabytes (MB).
- For text documents, the maximum number of tokens is 2 million, or about 1500 standard pages of text.
- For image files: 20 MB per image (for both direct uploads as well as embedded visual content)
If you’re in a data-dense sector, the 512 MB limit means you can’t even upload large documents and spreadsheets, let alone complex proprietary datasets.
Claude
Which File Types Can Be Crawled & Processed by Claude?
Claude can accept the following types of file formats (as long as they’re below 500 MB)
- Documents: PDF, DOCX, TXT, HTML, ODT, RTF, EPUB, JSON
- Spreadsheets: CSV, XLSX
- Presentations: PPTX
- Images: JPEG, PNG, GIF, WebP
- Code: Python (.py), JavaScript (.js), TypeScript (.ts), and other common code file formats
What File Sizes Can Claude Accept?
Even the aforementioned “accepted” file types cannot be processed beyond 500 MBs as of June 2026.
For a full project, the maximum file limit on Anthropic Claude is 30 MB per file (with unlimited uploads but as a requirement to fit within Claude’s context window.)
What File Formats is Claude Unable to Upload?
Claude runs into challenges processing and interpreting a wide-range of file types commonly used in advanced sectors, including:
- Geospatial / GIS: .shp, .shx, .dbf, .tif/.tiff, .kml/.kmz, .las/.laz, .gdb, .sid
- Satellite / Remote Sensing: .ntf, .h5/.hdf5, .nc, .fits, .grb/.grib2
- Energy / Utilities: .osh, .xml (CIM), .raw, .las (well logs), ECLIPSE simulation files
- Engineering / CAD: .dwg, .dxf, .rvt, .stp/.step, .igs, .stl, .x_t/.x_b, .CATPart/.CATProduct
- Scientific Research: .mat, .rds/.rda/.rdata, .h5, .parquet, .dcm, .mrc, .nii
- Finance / Quantitative: FIX logs, .bbg, .tick, HDF5 time series archives
- Manufacturing / Industrial: .prt, .sldprt/.sldasm, .l5x/.acd, OPC-UA exports
- Life Sciences / Biotech: .fastq/.fasta, .bam/.sam, .vcf, .mol/.sdf, .pdb
While some of these file types can be processed by Claude (with others on the list being flat out rejected by the model), Anthropic’s LLM admits itself that while it, “can receive [some of these file formats] and extract whatever plain text or metadata is readable, [it] cannot process the actual data structure, spatial relationships, or specialized encoding.”
That doesn’t work for advanced sectors that require fully accurate and multimodal decision-making.
Gemini
What Are the Standard File Formats Supported by Gemini?
Google Gemini is able to process the following file types:
- Documents: PDF, DOCX, TXT, HTML
- Spreadsheets: XLSX, CSV
- Presentations: PPTX, Slides export
- Images: JPG, PNG, WebP, SVG
- Audio: WAV, MP3, FLAC
- Video: MP4, MOV
- Code: .py, .js, .java, .cpp, .html
- ZIP (images/frames): .zip
What is Gemini’s Maximum File Size Limit?
When uploading files to the Gemini App, keep the following restrictions in mind:
- Documents: 100 MB per file and a cap of 10 files per prompt
- Images: 100 MB per file
- Videos: Up to 2 Gigabytes (GB) and a 5 minute length with basic plan
- Audio: 100 MB per file and up to 10 minute length with basic plan
- Code & ZIP files: 100 MB with a max of 5,000 files within a single archive.
What Advanced File Formats Cannot Be Fully Processed & Integrated by Gemini?
You’ll need to manually convert the following data types to CSVs or JSON to crawl and fully interpret them in Gemini (as long as they fit within its maximum file size).
If the file is in its standard format or too big, you can’t work with the following advanced file formats on Gemini:
- Geospatial / GIS: .shp, .shx, .dbf, .kml/.kmz, .gdb, .tif/.tiff, .las/.laz, .sid
- Engineering / CAD: .stp/.step, .igs/.iges, .stl, .x_t/.x_b, .prt, .sldprt/.sldasm, .rvt, .dwg/.dxf
- Scientific Research: .mat, .rds/.rda, .h5/.hdf5, .parquet, .dcm, .nii, .mrc
- Life Sciences / Biotech: .fastq/.fasta, .bam/.sam, .vcf, .mol/.sdf, .pdb
- Manufacturing / Industrial: .l5x/.acd, OPC-UA exports, ECLIPSE simulation files, .osh
- Satellite / Remote Sensing: .ntf, .nc, .fits, .grb/.grib2
- Finance / Quantitative: .bbg, .tick, FIX logs
Can Lium Process & Interpret Large, Complex File Formats?
Yes, while advanced industries hit roadblocks with general AI platforms’ inability to process and interpret most complex datasets, Lium was built with advanced industries and their proprietary data in mind.
When you connect ANY file format, regardless of size, Lium automatically indexes the selected files so that it can be interpreted by the answer engine in a few moments. That means all of your proprietary data can be fetched by Lium within its original file format, unlike general AI that requires the contents to be converted to an acceptable format and actually uploaded to the platform.
Not convinced? Try Lium yourself for free. Complex, domain-specific file formats that Lium regularly crawls and extracts insights from (without the need for format conversion) include RDF, NetCDF, GRIB2, HDF5, SEG-Y, CCSDS, BUFR, BIM/IFC, FASTQ, FITS. The file is the input.
Why General AI’s Inability to Crawl These File Types & Sizes is Only Part of the Problem (& How Lium Solves It)
Even if platforms like ChatGPT, Claude, and Gemini could process your dataset, your proprietary data is often so advanced and requires such a nuanced understanding of your sector that it can’t answer questions with the precision and depth you need. And when general AI treats every file as a one-session interaction, you can’t leverage it for multimodal reasoning where you need to extract data from multiple datasets at once.
Lium’s advanced industry AI was built to not only process large, complex datasets, but to interpret them multimodally to provide a POV from an industry-expert’s perspective.
Here’s where general AI hits roadblocks, and how Lium was built to deliver accurate results with industry-specific context:
Data-dense Reasoning:
LLMs carry strong general knowledge, but the expertise that drives critical decisions inside a specialized industry lives in internal data, institutional processes, and domain-specific knowledge accumulated over years.
Models like ChatGPT typically treat uploaded files as a reference while the model continues to reason from its general training on public Internet data. That is NOT the same as reasoning over your own data directly.
What You Need: True Proprietary Reasoning, Not Public Bias
Lium works directly within your proprietary data environment. It surfaces answers that exist entirely within your data, rather than answers from general AI that can be skewed by publicly crawled data.
One-Session Interactions with Data
General AI tools like Claude, ChatGPT, and Gemini have announced updates that allow for memory, moving toward session continuity and persistent context. This is meaningful progress IF you’re using AI for general knowledge tasks. But when you’re working with advanced proprietary data, it still isn’t cutting it.
Memory is not the same as a purpose-built data environment. Your organization's workflows depend on proprietary datasets that are indexed, structured, and refined to accurately reflect the realities of your businesses unique operations and pain points, general memory features fall short. They carry context forward from session and session, but do NOT build a compounding, reusable knowledge base around your data.
You Need AI Built for Advanced Data
Lium is designed to bring nuanced understanding to the complex questions in high-stake advanced sectors. Once data is integrated into your workflow and complex questions are asked and answered within Lium’s environment, the knowledge base grows.
Whether you’re working with geospatial, subsurface, financial, or any type of complex data, you can run that workflow without any development background. Lium’s answer engine reflects the deep sector knowledge your organization’s proprietary data offers, not a general model's approximation of it.
See Lium’s Ability to Crawl Any File Type in a Moment
Don’t let general AI’s inability to read and reason with obscure file types hold you back from getting fast answers to your most complicated problems.
Sign up for free today to see Lium’s ability to crawl even the most complex data sets and provide 100% accurate answers with nuanced understanding of your advanced industry.
Start for Free
See the platform with your data type

Written by Harrison Kelly
Technology Writer
Harrison Kelly is a B2B SEO & Content Marketing Consultant and freelance writer with more than a decade working and writing for technology companies. Notable software brands that Harrison has published work for include ZenDesk, SkyFi satellites, GovPilot, Classmates.com, and Belong Home. He graduated from The College of New Jersey with a business degree. He is a daily artificial intelligence user for solving complex problems and performing processes quickly.





