Комментарии:
It’s fascinating to watch this following the FSD v12 demo ride on HW3. So cool that the hardware works having been designed and produced years ago.
ОтветитьHere after the biography
ОтветитьSuch fuckign incompetent shits @Tesla...4 years later...2 million AP recalls, idiots. And FSD is nowhere close to complete. Such bullshittery from Elon Musk.
Ответитьeery to watch it, even now
ОтветитьYes sir quite impressive for self sustainable car globally
Ответитьถนนเส้นนี้จะเป็นถนนที่ดีที่สุดในโลกทั้งด้านการค้าการลงทุนและนวัตกรรม
ОтветитьBrick by brick 🧱
Ответить📍Fertitta College of Medicine
📍Baylor College of Medicine
March 2024, still waiting. Getting closer though 😅
Ответитьi love my tesla, i love what this guy is doing for humanity, im sold on his company, he is truly and honestly in our timeline a walking genius from another galaxy.
ОтветитьAny minute 🥱
Ответитьwhat is the tune called?
ОтветитьWatching FSD introduction after 4 years is incredible now that V12 is released and is proving to me mindblowing!
Elons predictions are super accurate....never bet against a Time Traveling Alien 😂
Me encanta Tesla, sus diseños , su seguridad, su comodidad y su aceleración y además su tecnología es uff Viva Tesla!!!
ОтветитьAs a tech Nerd, this video was awesome. so sad that just a few people will understand what is being shown here. and most certainly most investors wont get this.
ОтветитьHey Elon, it's next year ... actually 5 years later. Where's the robotaxi that you promised?
ОтветитьDear Tesla Shareholders - Vote by 6/13/214 in the Tesla Proxy vote "For" Yes 1- 5 and No on 6-12 Save Business Governance, Boards and Shareholders rights! Let's Go Tesla Shareholers
ОтветитьCrazy how its been 4 years already.
ОтветитьWatching this video as Xpeng just announced it's dropping Lidar.
ОтветитьI hope, Robotaxi Day still happens on 8/8. And we get to see some Robotaxis on road before the end of 2024. 🎉
ОтветитьTESLA is THE FUTURE, As long as the SUN is Shining, Tesla will be Alive and Kicking ( in this Case,, Driving, Flying and Digging)
ОтветитьOver 5 years later and Hardware 3 is performing better than ever!!! Incredible.
Ответить24 comments in the last 10 months. Jeez.
Ответитьdong good guys! all i have to do is steer the car and it does the rest !!
ОтветитьFrom 2024 and seaingbresults of tesla 😂😂
Ответить2024-09-22 - 5 years later, we have "Human Supervised Full Self-Driving".
“Next year [2020] for sure, we will have over a million robotaxis on the road.” Um, no.
Good old times where we just believed every lie Elon told to pump stock prices
ОтветитьHad to rewatch the FSD section at the end to actually get details that were lacking from We Robot event lol
ОтветитьThis video gets better every year. 🎉❤
ОтветитьEpic presentation. 5 years fast forward, the software caught up with the hardware and the predictions came true.
ОтветитьMany claims about near-term “feature-complete” autonomy and a huge robotaxi rollout by 2020–2021 have slipped. Yet Tesla has advanced “FSD Beta” to a point where it can handle a wide range of real-world roads, albeit requiring supervision. The fundamental theme remains the same: continuing to refine neural networks via massive real-world data and over-the-air updates—pushing toward a vision of autonomous fleets, even if it’s taking longer than originally projected.
ОтветитьReal-time constraints refer to the requirement that a system must complete its data processing and produce results (for example, steering or braking commands) within a strict, bounded time—often just milliseconds—so it can respond to the world as events happen. In other words, the system can’t be significantly delayed or it becomes unsafe or useless.
In the context of self-driving cars or other safety-critical applications, real-time constraints mean:
The sensors (cameras, radar, etc.) capture data continuously (e.g., 30–60 times per second).
The system—running its perception and decision-making software—must process each batch of sensor data fast enough to keep up with the flow of new data.
By the time the next sensor frame or batch arrives, the system’s previous computations must be done so the car can immediately adjust its steering, speed, or braking as the environment changes around it.
Why Real-Time Constraints Matter
Safety and Accuracy: A delay of even a few hundred milliseconds in recognizing a pedestrian or a sudden lane intrusion can be the difference between avoiding an accident and hitting an obstacle.
Continuous Control: Self-driving cars need to produce a continuous stream of control outputs (steering angle, acceleration, braking commands). If the AI is late to output these commands, the car can drift or fail to slow down in time.
Hardware Load: Real-time processing demands specialized hardware (such as Tesla’s FSD computer) and optimized software pipelines (e.g., efficient neural-network inference) so the computation can be done within tight time budgets.
System Design Trade-offs:
Compute Power vs. Complexity: More complex models can give higher accuracy but take longer to process each frame. The system has to be carefully designed so it doesn’t exceed the strict latency budget.
Energy Efficiency: Fast compute usually consumes more power; in an electric car, that impacts range. A design must balance speed with power constraints.
Example Time Budget in a Self-Driving Stack
Below is a hypothetical breakdown showing how real-time constraints might look over 100 milliseconds (ms) per frame:
Sensor Capture (Cameras, Radar): ~5–10 ms.
Neural Network Inference (Object Detection, Lane Detection, etc.): ~10–30 ms.
Sensor Fusion / Tracking: ~5–10 ms.
Planning & Control Algorithms: ~5–10 ms.
Safety & Redundancy Checks: ~1–5 ms.
Actuator Command: Must complete before the next frame arrives or before the situation changes significantly.
Each segment must complete reliably within those windows. If you slip past 100 ms total, you start dropping frames or “running behind,” and the car’s view of the world lags reality.
An end-to-end process—from data collection to custom hardware to continuous over-the-air updates—is extremely resource-intensive and operationally complex. Academics typically aim at core algorithmic research or narrowly scoped challenges that can be published and replicated in controlled conditions. Meanwhile, companies like Tesla invest heavily across the entire value chain, showing every step from sensors to AI training to deployment at scale, something rarely feasible in a traditional university environment.
Ответитьelon combines an overarching, often idealistic worldview (e.g., “multi-planetary species”) with concrete, hands-on engineering achievements. This balance of practical execution and future-minded thinking is arguably a key driver of his impact and high public profile.
ОтветитьTesla’s Full Self-Driving (FSD) Computer (HW3), focusing on the key design goals, the custom chip architecture, and how it all ties into the broader Autopilot/FSD system. Much of this information was originally presented during Tesla’s April 2019 Autonomy Day event by Pete Bannon (VP of Hardware Engineering) and Elon Musk.
1. System-Level Overview
Form Factor & Placement
The FSD Computer is a drop-in replacement for Tesla’s previous “HW2/HW2.5” Autopilot board. It sits behind the glovebox, connecting to the same camera/radar/ultrasonic harnesses.
Designed so that it can be retrofitted (where applicable) into existing vehicles that purchased Tesla’s FSD software option.
Dual-Computer Redundancy
Each board actually has two fully independent FSD computers—often called “A” side and “B” side.
They each have their own power supply and separate data paths. This ensures that if one fails, the other can still safely operate the vehicle.
Camera/Signal Inputs
Eight camera feeds, plus radar and ultrasonics, flow into each of the two computers. Both computers process all inputs in parallel for redundancy and cross-checking.
Each camera feed can run at up to 36–50 frames per second (depending on the network load).
Power Budget
The entire board targets under 100 W total consumption—important for electric vehicles, where every watt affects range.
In practice, under normal driving loads, Tesla cites ~72 W of consumption per board (and ~15 W specifically attributed to neural-network processing).
2. The Custom-Designed FSD Chip
At the heart of each FSD Computer lies a Tesla-designed system-on-chip (SoC), fabricated on a 14 nm FinFET process by Samsung. Here’s a breakdown of its main components:
CPU Complex
Twelve 64-bit CPU cores (Arm-based), running at up to ~2.2 GHz.
Provides general-purpose processing for tasks like planning, data logging, and control logic.
~2.5× the CPU performance vs. Tesla’s previous NVIDIA-based solution.
Neural Network Accelerators
Two custom “NNA” cores on each SoC, each delivering up to 36 TOPS (trillions of 8-bit integer operations per second). Combined is ~72 TOPS for the chip.
Each NNA has:
A 96×96 “multiply–accumulate” (MAC) array for dense matrix operations (convolution/deconvolution).
Dedicated on-chip SRAM (~32 MB total per accelerator) to cache weights/activations and avoid frequent DRAM reads.
The design focuses heavily on 8-bit integer math for energy efficiency (versus larger floating-point bitwidths).
This is the primary engine for Tesla’s real-time vision networks (object detection, segmentation, depth estimation, etc.).
GPU
A modest embedded GPU (Arm Mali or similar) used mainly for post-processing or other tasks that are less specialized than what the NNA accelerators handle.
Provides ~600 GFLOPs (32-bit) performance—significantly less raw ML throughput than the NNAs, but useful for miscellaneous parallelizable tasks.
Image Signal Processor (ISP)
Handles front-end tasks like HDR merges, tone mapping, and noise reduction on camera feeds.
Outputs more “cleaned up” frames for the neural networks to process.
Memory Subsystem
Dual-channel LPDDR4-4266 DRAM, providing a peak of ~68 GB/s memory bandwidth.
The large on-chip SRAM buffers reduce how often the system must fetch from external DRAM.
Safety & Security Blocks
Includes lockstep safety CPUs that act as final arbiters of actuator commands (steering, throttle, brake).
A hardware security module (HSM) ensures only Tesla-signed firmware/software can run.
3. Redundant Board Architecture
Two SoCs per Board
The autopilot computer board itself has two identical SoCs, each with its own DRAM, flash storage, and power domain.
Each SoC runs the full Autopilot or FSD software stack independently.
Cross-Checking Outputs
The two systems exchange final “plan” messages—if they diverge, the system can automatically degrade safely or re-check.
Minimizes the chance of a single hardware or software fault causing an unsafe scenario.
4. Performance Gains vs. Previous Hardware
Compared to NVIDIA’s PX2 (HW2.5):
Tesla’s in-house solution provides roughly an order of magnitude more raw ML compute—in the ballpark of 144 TOPS total (72 TOPS per SoC × 2 SoCs).
Achieves faster real-time inference for bigger, more complex neural networks while staying within the same approximate power envelope.
Cost Savings
Musk/Bannon stated the new FSD board is ~80% the cost of the older NVIDIA solution.
Lower cost + higher performance = crucial for scaling into millions of vehicles.
5. Real-Time Constraints & Software Architecture
Input Rate & Latency
With 8 cameras at ~30–50 FPS, the system must process hundreds of millions of pixels per second—plus radar, ultrasonics, and car-state data.
The neural nets and post-processing must complete in under typical real-time windows (e.g., ~100 ms) to ensure safe control updates.
Neural Network Compilation
Tesla wrote custom compilers to take trained PyTorch/TensorFlow models and optimize them for the NNA instructions (8-bit conv, pooling, activation layers).
Aggressive layer fusion, memory layout optimization, and batch-size=1 capabilities (to minimize latency) are part of the stack.
Over-the-Air Updates
Tesla continuously refines and updates the networks. The FSD computer can handle these increasingly large models over time, thanks to high compute overhead.
6. Why It’s Unusual in the Auto Industry
Vertical Integration:
Most automakers rely on third-party chips from NVIDIA, Intel/Mobileye, or Qualcomm. Tesla’s custom approach is unique—giving them full control over hardware, software, and design trade-offs.
Tailored for NN Workloads:
Instead of a general-purpose GPU or CPU, Tesla’s design prioritizes deep convolutional layers, small-batch inference, and large on-chip caches.
Minimizes overhead and power usage, crucial for an electric vehicle’s range and real-time safety demands.
Tesla’s vision-only strategy for Full Self-Driving (FSD)—why they decided to avoid LiDAR and HD mapping, and how they rely on cameras plus neural networks to perceive the world:
1. The Rationale for Vision-First
Human Analogy
Tesla’s argument is that humans operate vehicles effectively with only two eyes (analogous to cameras) and a brain (analogous to a neural network).
If biologically inspired vision is sufficient for human-level driving, an advanced camera-based AI can, in principle, replicate or exceed that capability.
Richness of Camera Data
Cameras provide full color, texture, and shape information across wide fields of view. This is more detail than LiDAR point clouds, which primarily yield precise distance measurements but less texture.
The assumption: With enough neural-network horsepower and training examples, cameras can accurately deduce not just object classification but also depth and velocity—mirroring human perception.
Cost and Complexity
LiDAR sensors (especially automotive-grade) can be expensive and bulky.
Adding LiDAR and mapping infrastructure increases BOM (Bill of Materials) costs. Tesla’s goal is mass-market affordability; removing LiDAR helps keep vehicle prices lower.
Fewer sensors and minimal pre-mapped data means a simpler supply chain and fewer single points of failure.
No Reliance on High-Definition Maps
Some competitors use HD maps to pre-store road geometry (lanes, traffic lights, road signs). Tesla views this as fragile—if the road changes, HD maps become outdated.
Instead, Tesla’s “occupancy+vision” approach updates the environment in real time. The car’s perception system sees changes (construction, new traffic patterns) on-the-fly.
2. How Tesla Handles Depth & Motion Without LiDAR
Stereo Overlap & Multi-Camera Geometry
Even though not all Tesla cameras are strictly “stereo pairs,” several cameras overlap fields of view. Neural networks can do geometry-based reasoning (structure from motion) across frames and overlapping angles.
The car’s forward-facing cameras, for example, have partially overlapping views, allowing parallax depth cues.
Monocular Depth Estimation
Neural networks can learn depth directly from single-camera frames by associating visual cues (object size, occlusions, perspective lines) with known real-world scales.
Radar, when included, can provide an extra reference to validate the neural network’s depth guesses—but Tesla has also begun phasing out radar, doubling down on pure vision.
Temporal Fusion (“Video Networks”)
By analyzing multiple sequential frames (video feed), the system can track how objects move over time.
This “motion parallax” approach is akin to how many animals with minimal stereo vision still sense depth by moving their heads (structure-from-motion).
Training with Real Driving Data
Tesla leverages its fleet to gather billions of miles of real, varied driving scenarios. When the car changes lanes or brakes, the resulting motion data can “label” distances and velocities for the neural net (sensor fusion, auto-labeling).
This massive scale of data feeds into Tesla’s “data engine,” continuously refining the vision models to better estimate depth, speed, and object boundaries.
3. Advantages Claimed by Tesla
Generalization to All Environments
Vision-based AI, if done well, can handle everything from highways to city streets to rural roads without depending on specialized HD maps.
Tesla argues LiDAR-based solutions often rely on a restricted “geo-fenced” region or meticulously mapped roads.
Continuous Updates
Because the system relies on real-time camera feeds, it can adapt instantly to new or dynamic changes (construction zones, lane re-striping, closed roads), rather than waiting for map updates.
Scalability and Cost Reduction
Fewer sensors keep costs lower and manufacturing simpler. This helps Tesla push autonomy into high-volume EV production.
The hope is that wide adoption yields more driving data, improving the neural networks, creating a flywheel effect of scale → data → better perception → more scale.
4. Criticisms and Challenges
Difficult Edge Cases
Critics point out situations like fog, heavy rain, blinding sun glare, night driving in unlit areas—conditions where LiDAR can sometimes see better if vision is partially occluded.
Tesla’s response is that their cameras + neural networks can handle these or degrade gracefully, and that radar or “hybrid” approaches do not fix all corner cases.
Regulatory & Public Skepticism
Many see LiDAR as a “guarantee” for real distance accuracy. Regulators may feel more comfortable with multi-sensor redundancy.
Tesla contends advanced vision is enough, but adoption could hinge on demonstrating extremely robust safety data.
High Compute Demand
Processing raw camera streams for 360° coverage at ~30–50 FPS is computationally heavy. Hence Tesla’s custom FSD computer with dedicated neural network accelerators.
As these networks get more sophisticated, Tesla must keep optimizing hardware/software to maintain real-time performance under a practical power budget.
5. Driving Principles of Vision-Only Development
Software 2.0 Mindset
According to Andrej Karpathy, Tesla is writing much of its “autonomy logic” in the form of deep neural networks, trained from data rather than hand-coded. Cameras are the main sensor input.
The car’s “understanding” of objects, lanes, drivable space emerges from these networks rather than explicit rules or 3D map references.
Fleet Feedback (“Shadow Mode”)
Even in purely vision-based mode, Tesla can gather mispredictions (e.g., times the driver intervenes) to identify new corner cases. Then the system is retrained with newly labeled clips.
The premise is that billions of real-world frames from across diverse geographies trump any artificially generated LiDAR maps or small test fleets.
Eliminating Handcrafted Dependencies
Tesla aims for a single, universal vision + AI approach that can in principle handle any city or highway in the world.
They see HD maps and LiDAR as partial solutions that risk failing if roads are altered or the map is out of date.
Tesla’s “Data Engine” and overall fleet learning approach to building and iterating on Autopilot / Full Self-Driving (FSD). This is arguably one of Tesla’s key differentiators: a system that constantly refines its AI based on real-world data collected from hundreds of thousands (eventually millions) of vehicles.
1. What Is the “Data Engine”?
At a high level, the data engine is an iterative pipeline that:
Deploys Tesla’s neural networks (in a “shadow mode” or active mode) onto the fleet.
Identifies and flags interesting or tricky driving scenarios where the network might have disagreed with human actions or been uncertain.
Uploads short snippets of relevant camera/radar data to Tesla’s servers for further inspection.
Labels or auto-labels that data, integrating it into Tesla’s training set.
Retrains the network with these new “hard” examples, then redeploys the updated network via over-the-air update.
This loop runs continuously, enabling Tesla to hone in on rare or difficult corner cases and improve its AI faster than a purely simulation-based or small-scale approach.
2. Key Components of the Data Engine
A. Shadow Mode
Definition: When Tesla rolls out a new or experimental neural network (e.g., for lane changes, pedestrian detection), they can run it “passively” in the car without actually controlling driving decisions.
Purpose: Compare how the new model’s predictions differ from the currently active model or human driver. This helps Tesla see if the new model would have performed better or worse in real scenarios.
Triggered Uploads: Whenever the shadow-mode network’s prediction differs significantly from reality (e.g., driver takes a turn while shadow net would have gone straight), the car flags that segment and sends it back for analysis.
B. Triggered Event Collection
Edge Cases & Interventions: Tesla specifically looks for scenarios where the driver intervenes with Autopilot, taps the brake, or otherwise overrides. These events often signal that Autopilot or the shadow network may have been on the wrong track.
Non-Random Sampling: Tesla does not upload every second of video from millions of cars—that would be infeasible. Instead, it relies on smart triggers for corners cases, near misses, or anomalies.
Privacy Considerations: Tesla anonymizes data; the system aims to send just the minimal snippet around the triggered event (like ~10–30 seconds).
C. Labeling & Auto-Labeling
Human Annotation: For complex scenes—e.g., construction zones, bizarrely shaped vehicles, road debris—teams of human labelers draw bounding boxes, lanes, or segmentation masks.
Sensor-Assisted Labeling: Tesla can use radar data or the car’s motion (e.g., if it slows down for an obstacle, that obstacle’s range is somewhat “labeled” automatically). This speeds up training data creation.
Auto-Labeling Pipelines: If hundreds (or thousands) of fleet clips show the same scenario, Tesla’s algorithms can triangulate and fuse them in a “virtual reconstruction,” generating 3D labels with minimal human intervention.
D. Training at Scale
Recurrent / Video-Based Networks: Once labeled, clips feed into large-scale training clusters. Tesla uses GPUs (and has hinted at an internal “Dojo” supercomputer in development) to handle massive volumes of data.
Iterative Approach: The newly trained network is tested on validation sets and, if it outperforms the old network, it replaces it in the next over-the-air (OTA) software update.
Continuous Improvement: Over time, the network “learns” to handle previously troublesome scenarios. Rare events become part of the data distribution if enough triggers occur.
E. Deployment & OTA Updates
Fleet-Wide or Staged Rollouts: Tesla can do phased releases—e.g., Early Access Program owners get new versions first, then broader distribution if reliability is confirmed.
Closed-Loop Feedback: Once deployed, the network again runs in active or partial shadow mode, finding further corner cases. The cycle repeats.
3. Why This Matters for AI Development
Real-World Data Is Hard to Beat
Simulations can’t capture every odd situation—weather quirks, unpredictable human drivers, random road debris, unusual vehicles, etc.
By leveraging hundreds of thousands of vehicles, Tesla can gather orders of magnitude more diverse data than most competitors.
Focus on Edge Cases
Instead of storing terabytes of normal highway driving, Tesla zeroes in on “interesting” anomalies, which are most valuable for training.
This focus significantly reduces the manual labeling burden and accelerates improvement in the riskiest scenarios.
Rapid Iteration
Because Tesla controls everything (hardware, software, over-the-air system), they can push a new release, see how it performs, gather data within days, train, and push out a better release.
Traditional automakers rely heavily on dealer visits or supplier-based update processes, limiting how quickly they can improve.
Data Network Effect
As Tesla sells more cars, the potential “sensor network” grows, collecting more corner cases. This forms a positive feedback loop: more cars → more data → better AI → more attractive product → more sales.
4. Concrete Examples
Cut-in Detection Network
Tesla launched a cut-in predictor that sees when cars from neighboring lanes suddenly merge in front of the Tesla.
Initially, they ran it in shadow mode to gather real highway merges. They collected a massive dataset of merges, labeled them automatically (knowing when merges actually happened), and retrained.
After sufficient validation, it went live, letting Autopilot anticipate merges earlier and slow down more smoothly.
Trash or Debris in Road
Even relatively rare occurrences (like a tire fragment or plastic bag) can be learned: whenever a Tesla driver swerves, the shadow system flags potential unknown object. If cameras confirm an obstacle, that clip is uploaded.
Labeled in the back end, retrained, re-deployed—over time, the network recognizes more and more roadside debris types.
5. Challenges & Critiques
Sheer Scale
Handling millions of trigger events, labeling them accurately, and managing retraining is logistically massive. Tesla has built extensive infrastructure for this—still a constant effort.
Reliance on Human Drivers as “Teachers”
The approach works best if the human driver’s interventions generally reflect the correct driving response. It assumes Tesla owners drive responsibly (or at least predictably).
Edge Cases beyond “Rare”
Some extremely rare events (freak accidents, bizarre vehicle mods) might not come up enough times to generalize well. Tesla tries to mitigate this by focusing on a wide global fleet.
6. The Future of Fleet Learning
Dojo Supercomputer:
Elon Musk and Andrej Karpathy have teased a high-performance “Dojo” training system to process massive video datasets more efficiently.
This could speed up the “data engine” loop by crunching ever larger, more complex 3D labeling tasks in less time.
Increasingly “Hands-Off” Data Collection:
As autonomy features mature, more real-world control decisions feed back into the data engine, possibly leading toward fully driverless scenarios.
Over time, Tesla aims to rely less on manual labeling and more on advanced auto-labeling of 3D reconstructions from fleet data.
Neural-network architectures and training methods Tesla discussed around their Autonomy Day (2019) timeframe. Andrej Karpathy’s presentation specifically showed how Tesla’s AI stack processes eight camera feeds, learns from real-world fleet data, and fuses everything into perception outputs (object detection, lane lines, paths) in real time.
1. Multi-Camera, Multi-Task Architecture
Multiple Camera Streams
Tesla uses eight cameras with overlapping fields of view to achieve a 360° view around the car. Each camera can run at up to ~36–50 FPS.
Historically, each camera feed was processed by dedicated neural networks or sub-networks. Over time, Tesla has moved toward more integrated “multi-camera” networks that fuse features.
“Trunk + Heads” Setup (High-Level Concept)
A shared “backbone” or “trunk” of convolutional layers ingests images to learn generic visual features (edges, shapes, textures).
Multiple “heads” branch out to specialized tasks:
Object detection: Cars, trucks, pedestrians, cyclists, traffic cones, etc.
Lane line & road boundary detection (segmentation or parametric polylines).
Traffic light & sign classification.
Depth or distance estimation (3D bounding boxes).
Drivable space / free-space segmentation.
Video / Temporal Fusion
Early networks were frame-by-frame (single-image inferences). Tesla began moving to spatiotemporal (video) networks, where multiple sequential frames improve velocity estimation and handle occlusions or sudden cut-ins.
This might involve LSTM-like or 3D convolutional layers, or custom “temporal difference” modules. The essence: each object can be tracked across frames, which helps with consistent labeling of speed, direction, and occupancy.
2. Key Technical Choices
8-Bit Integer Inference
The networks are trained in full-precision (FP32 or mixed precision) but compiled down to 8-bit integer ops for inference on Tesla’s custom FSD chip.
This requires advanced calibration (quantization) so the model accuracy remains high while enjoying the efficiency and throughput of integer ops.
Batch Size = 1 for Real-Time
Typical AI training might use large batch sizes to maximize GPU efficiency, but in a self-driving car, latency is critical. Tesla processes single frames as they arrive rather than waiting to accumulate a batch.
This speeds up each inference pass, ensuring fresh camera data is turned into driving decisions within tens of milliseconds.
Layer Fusion & Compiler Optimizations
Tesla wrote a custom compiler that merges consecutive layers (like convolution + activation + pooling) into a single kernel to minimize memory reads/writes.
As Pete Bannon described, each neural-network instruction on the FSD chip can directly trigger specialized hardware for convolution, pooling, ReLU, etc., reducing overhead.
Multi-Task Learning
Instead of training a separate network for every single output (lanes, objects, free space, etc.), Tesla uses shared feature extractors to let the net learn a richer representation. This helps each task benefit from the others (e.g., object detection can leverage the same “road context” features that lane detection uses).
3. Data Labeling & Training at Scale
Automated Label Generation
Karpathy illustrated how sensor data (radar or the vehicle’s motion) can label distances automatically. For instance, if the car passes an obstacle, the network can learn that object’s actual depth.
This cuts down on purely manual annotation, crucial when dealing with millions of frames across the Tesla fleet.
Huge Real-World Datasets
Because Tesla vehicles are on public roads daily, they capture edge cases—random debris, odd merges, unusual vehicles—en masse.
The “data engine” flags mispredictions or near misses in shadow mode, and those clips get labeled and reintroduced into training. This method actively focuses on tough scenarios rather than trivial data.
Iterative Release & Validation
Once a newly trained model outperforms the existing model on internal tests, Tesla pushes it either to a small “early access” fleet or in “shadow” to compare side-by-side.
If real-world metrics improve (fewer interventions, better alignment with human steering), that network eventually goes live. The cycle repeats, continually refining performance.
4. Network Outputs & Post-Processing
2D to 3D Fusion
Detecting bounding boxes in a camera image is a 2D problem—yet the car needs to know 3D positions and velocities. Tesla’s network can produce 3D bounding boxes directly (width, length, orientation, etc.), or it uses sensor fusion to refine them.
Early versions used radar or “structure from motion” across frames to anchor distance/velocity. Tesla has since moved toward “pure vision,” substituting careful multi-camera geometry for radar signals.
Bird’s-Eye View / Top-Down Representations
Internally, Tesla transforms camera-based detections into a top-down “vector space” so path-planning logic can easily reason about lanes, roads, and objects around the car.
The neural network can produce per-pixel “occupancy grids” or instance segmentations that feed this top-down representation.
Planning & Control
After perception is established (what’s around the car?), a separate planning system decides how to steer, brake, or accelerate. Over time, Tesla aims to embed more planning logic inside the neural network itself (“end to end” learning). But as of 2019–2020, they still used a mix of learned perception + classical path-planning heuristics.
5. “Software 2.0” Concept
Karpathy’s Philosophy:
The FSD stack heavily uses neural nets, so a lot of “driving logic” is no longer coded by hand (Software 1.0) but “trained” from data (Software 2.0).
Instead of writing thousands of edge-case rules, Tesla invests in data collection, labeling, and neural architectures that “learn” the rules implicitly.
Constant Redesign:
As neural networks solve more tasks reliably, Tesla removes old heuristic code. The “end state” is a smaller codebase orchestrating large neural nets that handle the bulk of perception and eventually part of planning.
6. Challenges & Evolution
Computational Load
Larger networks yield better accuracy but demand more processing power. Tesla’s FSD chip (HW3) was sized to handle big CNNs. Yet as the models grow (spatiotemporal, 3D occupancy), Tesla invests in more efficient architectures and possibly next-gen hardware (HW4, “Dojo” for training).
Robustness Across Conditions
Vision-based models must handle adverse weather, nighttime, harsh lighting. Tesla relies heavily on fleet data to systematically expose the nets to these conditions.
Deeper Integration of Time (Video Networks)
The 2019 event introduced the concept of video-based networks, but Tesla has since evolved to incorporate more “temporal memory” so the net can track objects through partial occlusion, merges, etc.
The only thing is a mini van or van can have 8 or 12 passengers lol a Tesla with 11 seats 💺 would be sweet 👀
Ответитьpov watching this in 2025 thinking about fsd cars
ОтветитьTecnología innovadora
Ответить1 month until unsupervised FSD
ОтветитьDoes anybody know the name of the intro music
Ответить2025 here. When are Tesla owners going to start making their promised $30,000 a year?
Ответить"we expect to have the first operating robotaxis next year" LOLLLL
"two years. [Dramatic Pause.] and if we need to accelerate that, we can always just delete parts~~[Smirking.]"
GOD i can't wait for the movie about this fraudster. Even his GRANDFATHER was famous for selling snake oil.